mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-09-01 04:14:38 +08:00

Author	SHA1	Message	Date
Awni Hannun	7be292c0c9	Handle longer prompt/generation (#931 ) * rebase * nits * nit * fix rotating cache with step prefill * update version	2024-08-16 15:28:39 -07:00
Zai Thottakath	4e01700816	Allow the entire model to be targed for LoRA and DoRA fine tuning: LoRA and DoRA embeddings with small DoRALinear bug fix (#914 ) * feature: LoRA adapter for Embeddings * feature: wire in LoRAEmbedding into the tuner. Allow the embedding and non model.layers Linear layers to be targeted for fine tuning * feature: DoRA adapter for Embeddings * feature: wire in DoRAEmbedding * bugfix: ensure self.m is recalculated when the linear layer is changed in DoRALinear.from_linear * refactor: prefer from_base over from_linear or from_embedding. prefer fuse over to_linear or to_embedding * cleanup: remove unused imports in test_dora.py * refactor: remove unnecessary non_layer_modules * cleanup: remove wrong comments for lora embedding dropout. remove uncessary parens in dora embedding dropout * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-08-16 07:38:36 -07:00
Awni Hannun	9b83004631	Faster sampling with `mx.compile` (#937 ) * faster sampling with compile * fix test	2024-08-15 11:29:09 -07:00
Awni Hannun	95840f32e2	Fix whipser conversion for safetensors models (#935 ) * fix whipser conversion for safetensor only. error in mlx lm for existing paths * fix tests	2024-08-14 10:22:04 -07:00
tidely	df744c98e6	Predict stop sequence matches during streaming (#541 ) * Predict stop sequence matches during streaming Check for overlap of stop sequences and the tokens array for potential sequence matches after more tokens get generated. Generate tokens until we can confirm that the stop sequence is not met. * fix typo * Change sequence_overlap logic * range isn't inclusive, add 1 to max_overlap * Add test_server.py Added a test for the sequence_overlap method * nits * eos sequence * finalize --------- Co-authored-by: Y4hL <43219534+Y4hL@users.noreply.github.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-08-06 15:24:15 -07:00
Khush Gupta	8fa12b0058	Adapters loading (#902 ) * Added functionality to load in adapters through post-requests so you do not need to restart the server * ran pre-commit * nits * fix test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-08-01 16:18:18 -07:00
Anchen	7a3ab1620a	support load model by custom get_model_classes (#899 ) * feature(mlx_lm): support load model by custom get classes * rename the param	2024-07-25 11:01:17 -07:00
Alex Cheema	cd8efc7fbc	Add support for Llama-3.1 (#907 ) * add dynamicNTK scaling rope * remove unused var * fix rope base * llama3.1 fixes * TODO for rope eval * vectorise llama3 base freq calculation * removed the arbitrary 2.0 rope_scale default case * fix slow llama3.1 generation by evaluating stateless part of DynamicNTKScalingRoPE in init * nits + format * use mx.pi * fix tests and add test for 3.1 --------- Co-authored-by: Prince Canuma <prince.gdt@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-07-23 13:21:32 -07:00
nicolov	fbe3247772	Add GPT-neox model (#863 )	2024-07-11 06:13:17 -07:00
Angelos Katharopoulos	f212b770d8	Server loads the model on demand from the request (#851 )	2024-06-27 11:37:57 -07:00
Chime Ogbuji	df6bc09d74	Configuration-based use of HF hub-hosted datasets for training (#701 ) * Add hf_dataset configuration for using HF hub-hosted datasets for (Q)LoRA training * Pre-commit formatting * Fix YAML config example * Print DS info * Include name * Add hf_dataset parameter default * Remove TextHFDataset and CompletionsHFDataset and use Dataset and CompletionsDataset instead, adding a text_key constructor argument to the former (and changing it to work with a provided data structure instead of just from a JSON file), and prompt_key and completion_key arguments to the latter with defaults for backwards compatibility. * nits * update docs --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-26 10:20:50 -07:00
Derek Lewis	89b0b75250	GPT2 Support (#798 ) * GPT-2 model support * Add test for gpt2 model * Fix weight sanitizing for quantization * use approx gelu --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-02 16:33:20 -07:00
madroid	c457a3f88b	LoRA: Extract small function (#614 ) * LoRA: Extract pre_processing_model function * LoRA: Extract small functions(train_model,evaluate_model) * move test case to test_tuner_utils.py * nits * nits * remove extra param, validate at it 0 * version * fix test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-02 06:38:42 -07:00
Chen Xin	aac98ca6f4	support internlm2 (#797 ) * support internlm2 * only attention projections --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-27 06:22:21 -07:00
Awni Hannun	69181e0058	Support non incremental kv cache growth (#766 )	2024-05-15 12:56:24 -07:00
Awni Hannun	fad9598372	Fix llama cache check (#763 ) * fix llama cache check * add test	2024-05-08 08:35:54 -07:00
Awni Hannun	ee60e2a9d5	Kv cache (#643 ) * in place kv_cache * fix * fix kv cache size * partially fix kv cache dtype * step kv cache * multiple of step size * more teests + kv cache * more kv cache * udpate all models to use kv cache	2024-05-08 08:18:13 -07:00
Anchen	f30413b63c	chore(mlx-lm): fix the number of validation batches configuration. (#752 ) * chore: fix number of validation batches * clean up * address comment	2024-05-04 06:52:42 -07:00
Prince Canuma	abcd891851	Add support for phi-3 (#712 ) * Add phi-3 modelling * fix rope scaling warning * add tests and update tuner utils * update name and remove sanitize * fix lora	2024-04-23 09:20:00 -07:00
Awni Hannun	2146bcd7ee	Quantize embedding / Update quantize API (#680 ) * more async eval * quantize embedding / update quantize api * more updates for quantize * update for quantize embeddings * update sd quant API * update sdxl quants * error for datasets < batch_size * async * fix config loading * fix quant * fix tests * fix req * remove lm head if tie weights is true * fix test	2024-04-18 18:16:10 -07:00
Anchen	f5f189e48a	fix(mlx-lm): broken server.py (#690 ) * fix server.py * fix var referenced before assignment * add test * clean up	2024-04-18 14:26:18 -07:00
dmdaksh	7d7e236061	- Removed unused Python imports (#683 ) - bert/model.py:10: tree_unflatten - bert/model.py:2: dataclass - bert/model.py:8: numpy - cifar/resnet.py:6: Any - clip/model.py:15: tree_flatten - clip/model.py:9: Union - gcn/main.py:8: download_cora - gcn/main.py:9: cross_entropy - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten - llms/gguf_llm/models.py:9: numpy - llms/mixtral/mixtral.py:12: tree_map - llms/mlx_lm/models/dbrx.py:2: Dict, Union - llms/mlx_lm/tuner/trainer.py:5: partial - llms/speculative_decoding/decoder.py:1: dataclass, field - llms/speculative_decoding/decoder.py:2: Optional - llms/speculative_decoding/decoder.py:5: mlx.nn - llms/speculative_decoding/decoder.py:6: numpy - llms/speculative_decoding/main.py:2: glob - llms/speculative_decoding/main.py:3: json - llms/speculative_decoding/main.py:5: Path - llms/speculative_decoding/main.py:8: mlx.nn - llms/speculative_decoding/model.py:6: tree_unflatten - llms/speculative_decoding/model.py:7: AutoTokenizer - llms/tests/test_lora.py:13: yaml_loader - lora/lora.py:14: tree_unflatten - lora/models.py:11: numpy - lora/models.py:3: glob - speechcommands/kwt.py:1: Any - speechcommands/main.py:7: mlx.data - stable_diffusion/stable_diffusion/model_io.py:4: partial - whisper/benchmark.py:5: sys - whisper/test.py:5: subprocess - whisper/whisper/audio.py:6: Optional - whisper/whisper/decoding.py:8: mlx.nn	2024-04-16 07:50:32 -07:00
Awni Hannun	c68aa3c7c3	Stable lm 2 (#666 ) * stable lm 2 * test and lora * version bump * merge stable models	2024-04-08 14:18:55 -07:00
Prince Canuma	d661440dbb	Add support for qwen2moe (#640 ) * add sparsemoe block and update decoder logic * update file name to match HF * update name * Code formatting * update gates calculation * add support for Qwen2MoE. * fix pytest * code formatting and fix missing comma in utils * Remove decoder sparse step. Co-authored-by: bozheng-hit <dsoul0621@gmail.com> * remove gate layer anti-quantisation * remove unused argument --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com>	2024-04-02 11:33:29 -07:00
Chime Ogbuji	f6283ef7ce	Configurable LR schedulers (#604 ) * Initial config handler and test * Added means to run from CLI * Update lora config loading and tests * Constrain scheduler config (warmup and minimum LR) for each kind * Update reference to moved schedule_config module * Minor fix * Fix typos * Moved build_schedule and tests * nits in schedule config * flake * fix path --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-29 13:41:10 -07:00
Anchen	297a908e3d	fix(mlx-lm): type hints in gguf.py (#621 )	2024-03-26 07:56:01 -07:00
Anchen	0ab01b4626	fix(mlx-lm): sorted probs in top_p implementation. (#610 ) * fix(mlx-lm): the top p imp * chore: address comment	2024-03-25 15:07:55 -07:00
Awni Hannun	b8a348c1b8	Switch to fast RMS/LN Norm (#603 ) * use nn.RMSNorm, use sdpa, cleanup * bump mlx versions * minor update * use fast layer norm * version bump * update requirement for whisper * update requirement for gguf	2024-03-23 07:13:51 -07:00
Anchen	fbed720d6f	chore(mlx-lm): fix the top_p implementation. (#602 ) * chore(mlx-lm): clean up the top p imp * chore: clean up * chore: add test * chore: address comments * chore: clean up docs string * chore: clean up test	2024-03-21 12:18:23 -07:00
Anchen	949f63f309	chore(mlx-lm): fix print_trainable_parameters for quant models (#581 ) * chore(mlx-lm): fix print_trainable_parameters for quant models * chore: clean up * refactor: use layer type to check quant bits * chore: address comment	2024-03-20 08:41:03 -07:00
madroid	b0bcd86a40	Support for OpenAI’s fine-tuning dataset format (#548 ) * LoRA: move load_dataset to tuner/datasets.py file * LoRA: support OpenAI chat format datasets see https://platform.openai.com/docs/guides/fine-tuning/example-format * LoRA: support OpenAI completion format datasets * LoRA: formatting dataset timing to reduce memory footprint * Refactor dataset item access in PromptCompletionDataset * Update mlx_lm/LORA.md * Update mlx_lm/LORA.md * check Unsupported data format * add tests, fine-tune doc * add tests, fine-tune doc * add jinja2 for chat template * nits in readme * nits in readme --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-19 16:45:46 -07:00
Prince Canuma	76c3244cc5	Add support for Cohere's Command-R (#565 ) * initial commit for command-R * update mlp, layernorm, lm_head and model args * add custom layernorm * add default to tie_word_embeddings * add layernorm weight type and refactor * update layernorm (bias conditional) in model/layers * fix layer norm use traditional rope * add test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-13 07:03:36 -07:00
Anchen	3535408c99	chore(mlx-lm): fix tie_word_embeddings for qwen2 (#566 ) * chore: fix tie_word_embeddings for qwen2 * chore: default tie_word_embeddings to True	2024-03-12 21:34:32 -07:00
Chime Ogbuji	e56d9015ef	LoRA on all linear transformer block layers (#546 ) * Add --lora-all-linear option to apply LoRa to all linear transfer block layers * Moved to YAML config and added specification of rank & alpha * nits in conifg, more tests * nit * run tests for prs --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-12 07:37:40 -07:00
Awni Hannun	7cdd1b69ac	Enable unit testing in Circle and start some MLX LM tests (#545 ) * add a few tests for mlx lm * add a few tests for mlx lm * add a few tests for mlx lm * more tests / cleanup	2024-03-07 09:31:57 -08:00

35 Commits