mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-06-25 09:51:19 +08:00

History

Gökdeniz Gülmez 76710f61af Adding support for mamba (#940 ) * initial commit * initial commit * Adding first lines * adding x, and dt projection layers * adding the clamping mechanism * First succesful inference * last commit for today - added custom geenrate function and it works as expected, will try training and then with loading a model from the hub * clean up * save up * almost * update * update * fixed cache handeling * fixed loading * added seperate generat_step method in the model and also in the utils to automaticaly use the generate step mthod in the model class * quick update * still not working * save * still not working * initial commit * utils.py logits = logits[:, -1, :] TypeError: tuple indices must be integers or slices, not tuple * update * update * Fixing the Batching Depfwise Comnvolution and multi token input * fixing generate and logits outputs * Done! * Fixing the cache handling, generating works now trying training * update ACKNOWLEDGEMENTS * removing the model_type if stuff in the _step loop in generate_step and adding MambaCache in base.py for training easier generations and removing mamba in tuner/utils. * quick clean up * update trainer/utils for right initialisation of the layers for LoRA, but not working. * clean up * Forther update to trainer/utils for correct layer selection. Successfull training * removing extra mamba-infer.py file * clean up, reformating will come later * reformat and big clean up, final commit * some speedups and cleanups * fix test * nits * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>		2024-09-28 07:02:53 -07:00
..
__init__.py	Mlx llm package (#301 )	2024-01-12 10:25:56 -08:00
base.py	Add the ability to load the KV cache from a file (#956 )	2024-08-28 22:11:45 -07:00
cohere.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
dbrx.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
deepseek_v2.py	Use fast rope (#945 )	2024-08-23 13:18:51 -07:00
deepseek.py	feat: DeepSeek MoE v1 (#942 )	2024-08-17 07:18:09 -07:00
gemma2.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
gemma.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
gpt2.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
gpt_bigcode.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
gpt_neox.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
internlm2.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
llama.py	Use fast rope (#945 )	2024-08-23 13:18:51 -07:00
mamba.py	Adding support for mamba (#940 )	2024-09-28 07:02:53 -07:00
minicpm.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
mixtral.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
nemotron.py	feat(mlx_lm): Nemotron (#949 )	2024-08-29 21:08:57 -07:00
olmo.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
openelm.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
phi3.py	Use fast rope (#945 )	2024-08-23 13:18:51 -07:00
phi3small.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
phi.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
phimoe.py	Add Phi-3.5-MoE (#946 )	2024-08-24 06:52:33 -07:00
phixtral.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
plamo.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
qwen2_moe.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
qwen2.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
qwen.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
recurrent_gemma.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
stablelm.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
starcoder2.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00
su_rope.py	Add Phi-3.5-MoE (#946 )	2024-08-24 06:52:33 -07:00
switch_layers.py	Handle longer prompt/generation (#931 )	2024-08-16 15:28:39 -07:00