mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-12-16 02:08:55 +08:00

Author	SHA1	Message	Date
JosefAlbers	fda41545a6	Su-RoPE(Rotary Position Embedding) for Phi-3 (#813 ) * Su-RoPE * nits * Update su_rope.py * Update su_rope.py Per GPT4: "The error TypeError: 'type' object is not subscriptable is caused by using the type hint list[float] in a version of Python that does not support it. This syntax is only available in Python 3.9 and later." * Ran isort --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-11 06:20:04 -07:00
Yi Wang	a54dfd698e	Correct the type annotation of cache in llama.py (#828 ) * Update * Fix isort	2024-06-10 15:18:34 -07:00
Yi Wang	bb8227f181	Correct type annotation of llama.ModelArgs.num_key_value_heads (#827 )	2024-06-10 14:47:31 -07:00
Awni Hannun	c5da302fc4	gpu featurization (#824 )	2024-06-07 08:59:44 -07:00
Robin Glauser	4872727f14	Fixing "NameError: name 'resume_adapter_file' is not defined" (#817 ) args. is missing from resume_adapter_file so the name is not defined.	2024-06-05 10:07:31 -07:00
Michał Kurc	43d6deb3c1	mlx_lm: Add Streaming Capability to Generate Function (#807 ) * Add streaming feature to text generation function * separate stream and regular functions --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-03 09:04:39 -07:00
Shiyu	8353bbbf93	Segment Anything Model (#552 ) * add segment anything model * add readme * reorg file structure * update * lint * minor updates * ack * fix weight loading * simplify * fix to run notebooks * amg in mlx * remove torch dependency * nit in README * return indices in nms * simplify * bugfix / simplify * fix bug' * simplify * fix notebook and remove output * couple more nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-02 16:45:51 -07:00
Derek Lewis	89b0b75250	GPT2 Support (#798 ) * GPT-2 model support * Add test for gpt2 model * Fix weight sanitizing for quantization * use approx gelu --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-02 16:33:20 -07:00
madroid	c457a3f88b	LoRA: Extract small function (#614 ) * LoRA: Extract pre_processing_model function * LoRA: Extract small functions(train_model,evaluate_model) * move test case to test_tuner_utils.py * nits * nits * remove extra param, validate at it 0 * version * fix test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-02 06:38:42 -07:00
Awni Hannun	81318ad4a8	Port of phi3small (#794 ) * start port of phi3small * fix phi3 * use block sparsity * compile activation * nits in readme / mlx lm version	2024-05-31 12:54:14 -07:00
Awni Hannun	09aaeac72c	fix moe conversion (#802 )	2024-05-31 12:36:05 -07:00
Behnam Moh	f49c5f2829	fixed the requirements (#803 )	2024-05-29 06:14:19 -07:00
Chen Xin	aac98ca6f4	support internlm2 (#797 ) * support internlm2 * only attention projections --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-27 06:22:21 -07:00
Awni Hannun	ca7ce60c91	Rename block sparse to gather (#793 ) * rename block sparse to gather * pin mlx version	2024-05-23 19:47:35 -07:00
Prince Canuma	69700d8431	Add support for Phi-3 Medium (#790 ) * update to support phi-3 medium * fuse qkv split	2024-05-22 16:47:06 -07:00
Prince Canuma	b044ce2acf	Add support for ibm granite (#758 ) * add support for granite 3-8B config * add gpt_bigcode * add positional embedding condition. * add support for granite 3-8B config * add gpt_bigcode * add positional embedding condition. * remove unused function * rebase fix * move position emebedding to mask creation * add to tuner and format * add support for granite 3-8B config * add gpt_bigcode * add positional embedding condition. * add support for granite 3-8B config * add gpt_bigcode * add positional embedding condition. * rebase fix * move position emebedding to mask creation * add to tuner and format * refactor mask * remove dropout layers	2024-05-21 20:16:31 -07:00
Awni Hannun	9fc6efbd90	version bump + some fixes (#792 )	2024-05-21 20:09:35 -07:00
Angelos Katharopoulos	9f671228cd	Block sparse MM MoEs (#782 ) - Adds SwitchLinear - Adds QuantizedSwitchLinear	2024-05-21 15:58:08 -07:00
AtakanTekparmak	199df9e110	fix: Added dedicated error handling to load and get_model_path (#775 ) * fix: Added dedicated error handling to load and get_model_path Added proper error handling to load and get_model_path by adding a dedicated exception class, because when the local path is not right, it still throws the huggingface RepositoryNotFoundError * fix: Changed error message and resolved lack of import * fix: Removed redundant try-catch block * nits in message * nits in message --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-20 06:39:05 -07:00
Awni Hannun	e92de216fd	rid warning (#789 )	2024-05-20 06:05:33 -07:00
alexC-nonsense4k	42458914c8	support dora finetune in mlx-examples/llms/mlx_lm (#779 ) * support dora finetune * solve problems in lora.py and tuner.utils.py * add use_dora (bool) in functions of load adapters * delete all unsupported quantization code and fix all the calculate problems in mlx_lm/tuner/dora.py * Using stop_gradient to prevent gradients from flowing through ‘norm’ during backpropagation * set DEFAULT_USE_DORA in mlx_lm/generate.py * add annotation for all the use_dora * mlx_lm/fuse.py support fuse dora layers and fix a bug of to_linear() in mlx_lm/tuner/dora.py * simplify code of juding type of a fused layer in mlx_lm/fuse.py * add use_dora in mlx_lm/fuse.py when apply_lora_layers() * style + nits * style + nits * more updates --------- Co-authored-by: chenyifei08 <chenyifei08@baidu.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-16 08:21:26 -07:00
Awni Hannun	69181e0058	Support non incremental kv cache growth (#766 )	2024-05-15 12:56:24 -07:00
Jinwu Zhan	1a86d985d9	Support `--add_eos_token` argument within Lora training (#760 ) * Support `--add_eos_token` argument to empower users to control the addition of the eos token during LoRA training, addressing issues like incomplete text generation. * Support `--add_eos_token`， code format --------- Co-authored-by: Zhan ChengLong <zhanchenglong@bytedance.com>	2024-05-13 17:17:42 -07:00
JosefAlbers	10853b57d9	Add `model_config` parameter to `load()` and `load_model()` (#770 ) * Add `model_config` parameter to `load()` and `load_model()` For easy editing of the loaded model configuration (e.g., for changing RoPE theta or scaling of Phi-3 model) Example: ```python from mlx_lm import load, generate model, tokenizer = load("mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed", model_config={"rope_theta":50000.0}) response = generate(model, tokenizer, prompt, max_tokens=MAX_TOKENS) ``` * Possible bug (default_loss) * Revert "Possible bug (default_loss)" This reverts commit `70a55ace18`. * Fix default_loss for lora * 1. move load_model's new optional `model_config` arg to the end (fetch_from_hub()'s `model = load_model(model_path, lazy)`) 2. fix indentations (`black` hook)	2024-05-10 10:13:34 -07:00
Awni Hannun	6f0a69e682	fix lora for openelm (#773 )	2024-05-10 09:51:41 -07:00
Awni Hannun	fad9598372	Fix llama cache check (#763 ) * fix llama cache check * add test	2024-05-08 08:35:54 -07:00
Awni Hannun	ee60e2a9d5	Kv cache (#643 ) * in place kv_cache * fix * fix kv cache size * partially fix kv cache dtype * step kv cache * multiple of step size * more teests + kv cache * more kv cache * udpate all models to use kv cache	2024-05-08 08:18:13 -07:00
Albert Avetisian	bfbc0e434a	Add optional EOS token for llava example (#753 ) * add optional EOS token * add tokenizer config to align with MLX LM example * formtatting fixes	2024-05-08 06:04:36 -07:00
Kevin Wang	c0019c4908	Pad mask with zeros for non-square attention matrices (#715 ) * Pad mask with zeros for non-square attention matrices The current implementation of the mask assumes the attention matrix is square, which is true if there is no cache. However, if one wishes to produce multiple tokens at a time, such as in speculative decoding implementations, a rectangular mask is necessary. This change pads the bottom of the mask with zeros so multi-token decoding with a cache works correctly. * Directly create mask instead of padding * Update llama.py	2024-05-04 16:32:25 -07:00
Anchen	f30413b63c	chore(mlx-lm): fix the number of validation batches configuration. (#752 ) * chore: fix number of validation batches * clean up * address comment	2024-05-04 06:52:42 -07:00
Awni Hannun	2bf11c4633	Use stable url for MNIST (#749 ) * use stable url * remove deprecated flag	2024-05-03 17:13:05 -07:00
Konstantin Kerekovski	d1c35fa684	Add MLX Cache Limit setting for mlx_lm.generate and mlx_lm.server CLI (#744 ) * Add support for setting MLX cache limit in GB * Add support for setting MLX cache limit in GB in mlx_lm.server * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-03 12:42:48 -07:00
Ivan Fioravanti	b468091f7f	Add model management functionality for local caches (#736 ) * Add model management functionality for local caches This commit introduces a set of command-line utilities for managing MLX models downloaded and saved locally in Hugging Face cache. The functionalities include scanning existing models, retrieving detailed information about a specific model, and deleting a model by its name. * Added mlx_lm.model to setup.py * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-03 12:20:13 -07:00
Awni Hannun	92430df0a0	Fix lora for qwen moe (#743 ) * fix lora for qwen moe * use max seq length in test as well	2024-05-02 21:55:09 -07:00
madroid	5079af62db	Update model card describe (#654 ) * Update model card describe - Add full link jump - Add the address of the model uploader's Hugging Face homepage * Add user_info to reduce whoami calls * Remove the -U argument * remove HF user info * run pre-commit	2024-05-02 21:22:04 -07:00
madroid	6775d6cb3f	Whisper: Add pip distribution configuration to support pip installations. (#739 ) * Whisper: rename whisper to mlx_whisper * Whisper: add setup.py config for publish * Whisper: add assets data to setup config * Whisper: pre-commit for setup.py * Whisper: Update README.md * Whisper: Update README.md * nits * fix package data * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-01 09:00:02 -07:00
Karim Elmaaroufi	4bf2eb17f2	Validate server params & fix logit bias bug (#731 ) * Bug fix in logit bias * Add parameter validations * Fix typo * Update docstrings to match MLX styling * Black style + fix a validation bug	2024-04-30 07:27:40 -07:00
Jaward Sesay	7c0962f4e2	Add Supported Quantized Phi-3-mini-4k-instruct gguf Weight (#717 ) * support for phi-3 4bits quantized gguf weights * Added link to 4 bits quantized model * removed some prints * Added correct comment * Added correct comment * removed print Since last condition already prints warning for when quantization is None	2024-04-29 20:11:32 -07:00
Thomas Lazarus	5513c4e57d	Fixes Typo in Starcoder2 (#740 )	2024-04-29 13:14:45 -07:00
Javier de la Rosa	510d2bde49	Force multi_commits when uploading to HF (#729 )	2024-04-28 19:07:17 -07:00
锦此	699de35b03	Update lora_config.yaml (#735 ) Update LoRa config YAML, replacing the adapter file argument with the adapter path argument.	2024-04-28 10:24:34 -07:00
Prince Canuma	c012eb173f	Add support for OpenELM (#719 ) * add openELM * update splitting logic * update qkv logic and, transformer and MLP block * code formatting and fix args * fix array slicing and remove unused var :) * add to tuner * use mx.split for slicing qkv * merge with phi3 * remove rope scaling logic * code formatting	2024-04-25 16:49:28 -07:00
Gökdeniz Gülmez	2c1c9e9024	MiniCPM implementation (#685 ) * Added support for the MiniCPM architecture * Added support for the MiniCPM architecture * Updated utils.py and LORA.md * Updated utils.py and LORA.md * Update implementation details for MiniCPM architecture * Cleaning up * fixed the missing lm.head layer problem * Refactor Model class to dynamically handle tied and untied word embeddings * Quick update * added a dynamic rope scaling base calucaltion * Added support for the MiniCPM architecture * Added support for the MiniCPM architecture * Updated utils.py and LORA.md * Updated utils.py and LORA.md * Update implementation details for MiniCPM architecture * Cleaning up * fixed the missing lm.head layer problem * Refactor Model class to dynamically handle tied and untied word embeddings * added a dynamic rope scaling base calucaltion * quick fix and clean up * clean up again * removed the MiniCPMNorm class as its not used * forgot something, sorry * format * version bump --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-25 15:29:28 -07:00
Awni Hannun	685012c2ad	Couple fixes for LoRA (#711 ) * don't overwrite in test only mode * only load model specific safetensors	2024-04-25 14:16:13 -07:00
Kristian Muñiz	109ee2f2f8	Use CORS headers for streaming for MLX Server (#716 )	2024-04-25 07:26:04 -07:00
Kevin Wang	8a265f0d54	Fix incorrect type annotation (#720 ) A `Tuple` is missing in this type annotation.	2024-04-24 15:52:43 -07:00
Prince Canuma	abcd891851	Add support for phi-3 (#712 ) * Add phi-3 modelling * fix rope scaling warning * add tests and update tuner utils * update name and remove sanitize * fix lora	2024-04-23 09:20:00 -07:00
Awni Hannun	ecbc6ff1e3	one more quant fix (#708 )	2024-04-22 18:12:52 -07:00
Aaron Ng	8d5cf5b0c8	use logging in mlx server (#705 )	2024-04-22 07:50:06 -07:00
AlexandrosChrtn	f20e68fcc0	Load fused model with transformers (#703 ) * save format for transformers compatibility * save format for transformers compatibility arg * hardcode mlx * hardcode mlx format	2024-04-21 09:04:44 -07:00

1 2 3 4 5 ...

456 Commits