mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-12-16 02:08:55 +08:00

Author	SHA1	Message	Date
Kevin Wang	c0019c4908	Pad mask with zeros for non-square attention matrices (#715 ) * Pad mask with zeros for non-square attention matrices The current implementation of the mask assumes the attention matrix is square, which is true if there is no cache. However, if one wishes to produce multiple tokens at a time, such as in speculative decoding implementations, a rectangular mask is necessary. This change pads the bottom of the mask with zeros so multi-token decoding with a cache works correctly. * Directly create mask instead of padding * Update llama.py	2024-05-04 16:32:25 -07:00
Anchen	f30413b63c	chore(mlx-lm): fix the number of validation batches configuration. (#752 ) * chore: fix number of validation batches * clean up * address comment	2024-05-04 06:52:42 -07:00
Awni Hannun	2bf11c4633	Use stable url for MNIST (#749 ) * use stable url * remove deprecated flag	2024-05-03 17:13:05 -07:00
Konstantin Kerekovski	d1c35fa684	Add MLX Cache Limit setting for mlx_lm.generate and mlx_lm.server CLI (#744 ) * Add support for setting MLX cache limit in GB * Add support for setting MLX cache limit in GB in mlx_lm.server * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-03 12:42:48 -07:00
Ivan Fioravanti	b468091f7f	Add model management functionality for local caches (#736 ) * Add model management functionality for local caches This commit introduces a set of command-line utilities for managing MLX models downloaded and saved locally in Hugging Face cache. The functionalities include scanning existing models, retrieving detailed information about a specific model, and deleting a model by its name. * Added mlx_lm.model to setup.py * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-03 12:20:13 -07:00
Awni Hannun	92430df0a0	Fix lora for qwen moe (#743 ) * fix lora for qwen moe * use max seq length in test as well	2024-05-02 21:55:09 -07:00
madroid	5079af62db	Update model card describe (#654 ) * Update model card describe - Add full link jump - Add the address of the model uploader's Hugging Face homepage * Add user_info to reduce whoami calls * Remove the -U argument * remove HF user info * run pre-commit	2024-05-02 21:22:04 -07:00
madroid	6775d6cb3f	Whisper: Add pip distribution configuration to support pip installations. (#739 ) * Whisper: rename whisper to mlx_whisper * Whisper: add setup.py config for publish * Whisper: add assets data to setup config * Whisper: pre-commit for setup.py * Whisper: Update README.md * Whisper: Update README.md * nits * fix package data * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-01 09:00:02 -07:00
Karim Elmaaroufi	4bf2eb17f2	Validate server params & fix logit bias bug (#731 ) * Bug fix in logit bias * Add parameter validations * Fix typo * Update docstrings to match MLX styling * Black style + fix a validation bug	2024-04-30 07:27:40 -07:00
Jaward Sesay	7c0962f4e2	Add Supported Quantized Phi-3-mini-4k-instruct gguf Weight (#717 ) * support for phi-3 4bits quantized gguf weights * Added link to 4 bits quantized model * removed some prints * Added correct comment * Added correct comment * removed print Since last condition already prints warning for when quantization is None	2024-04-29 20:11:32 -07:00
Thomas Lazarus	5513c4e57d	Fixes Typo in Starcoder2 (#740 )	2024-04-29 13:14:45 -07:00
Javier de la Rosa	510d2bde49	Force multi_commits when uploading to HF (#729 )	2024-04-28 19:07:17 -07:00
锦此	699de35b03	Update lora_config.yaml (#735 ) Update LoRa config YAML, replacing the adapter file argument with the adapter path argument.	2024-04-28 10:24:34 -07:00
Prince Canuma	c012eb173f	Add support for OpenELM (#719 ) * add openELM * update splitting logic * update qkv logic and, transformer and MLP block * code formatting and fix args * fix array slicing and remove unused var :) * add to tuner * use mx.split for slicing qkv * merge with phi3 * remove rope scaling logic * code formatting	2024-04-25 16:49:28 -07:00
Gökdeniz Gülmez	2c1c9e9024	MiniCPM implementation (#685 ) * Added support for the MiniCPM architecture * Added support for the MiniCPM architecture * Updated utils.py and LORA.md * Updated utils.py and LORA.md * Update implementation details for MiniCPM architecture * Cleaning up * fixed the missing lm.head layer problem * Refactor Model class to dynamically handle tied and untied word embeddings * Quick update * added a dynamic rope scaling base calucaltion * Added support for the MiniCPM architecture * Added support for the MiniCPM architecture * Updated utils.py and LORA.md * Updated utils.py and LORA.md * Update implementation details for MiniCPM architecture * Cleaning up * fixed the missing lm.head layer problem * Refactor Model class to dynamically handle tied and untied word embeddings * added a dynamic rope scaling base calucaltion * quick fix and clean up * clean up again * removed the MiniCPMNorm class as its not used * forgot something, sorry * format * version bump --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-25 15:29:28 -07:00
Awni Hannun	685012c2ad	Couple fixes for LoRA (#711 ) * don't overwrite in test only mode * only load model specific safetensors	2024-04-25 14:16:13 -07:00
Kristian Muñiz	109ee2f2f8	Use CORS headers for streaming for MLX Server (#716 )	2024-04-25 07:26:04 -07:00
Kevin Wang	8a265f0d54	Fix incorrect type annotation (#720 ) A `Tuple` is missing in this type annotation.	2024-04-24 15:52:43 -07:00
Prince Canuma	abcd891851	Add support for phi-3 (#712 ) * Add phi-3 modelling * fix rope scaling warning * add tests and update tuner utils * update name and remove sanitize * fix lora	2024-04-23 09:20:00 -07:00
Awni Hannun	ecbc6ff1e3	one more quant fix (#708 )	2024-04-22 18:12:52 -07:00
Aaron Ng	8d5cf5b0c8	use logging in mlx server (#705 )	2024-04-22 07:50:06 -07:00
AlexandrosChrtn	f20e68fcc0	Load fused model with transformers (#703 ) * save format for transformers compatibility * save format for transformers compatibility arg * hardcode mlx * hardcode mlx format	2024-04-21 09:04:44 -07:00
Anchen	749cabf299	fix: unicode decoding (#702 )	2024-04-21 08:58:23 -07:00
Karim Elmaaroufi	1484598de1	Add support for logit bias (#697 )	2024-04-21 06:53:56 -07:00
Awni Hannun	6abdbe3be8	Fix quant in gguf (#698 ) * fix quant in gguf * fix whisper	2024-04-19 20:07:11 -07:00
Awni Hannun	574ad7f6fe	fix dequantization (#693 )	2024-04-19 10:46:59 -07:00
Awni Hannun	2146bcd7ee	Quantize embedding / Update quantize API (#680 ) * more async eval * quantize embedding / update quantize api * more updates for quantize * update for quantize embeddings * update sd quant API * update sdxl quants * error for datasets < batch_size * async * fix config loading * fix quant * fix tests * fix req * remove lm head if tie weights is true * fix test	2024-04-18 18:16:10 -07:00
Anchen	f5f189e48a	fix(mlx-lm): broken server.py (#690 ) * fix server.py * fix var referenced before assignment * add test * clean up	2024-04-18 14:26:18 -07:00
Phúc H. Lê Khắc	35206806ac	Create executables for generate, lora, server, merge, convert (#682 ) * feat: create executables mlx_lm.<cmd> * nits in docs --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-16 16:08:49 -07:00
dmdaksh	7d7e236061	- Removed unused Python imports (#683 ) - bert/model.py:10: tree_unflatten - bert/model.py:2: dataclass - bert/model.py:8: numpy - cifar/resnet.py:6: Any - clip/model.py:15: tree_flatten - clip/model.py:9: Union - gcn/main.py:8: download_cora - gcn/main.py:9: cross_entropy - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten - llms/gguf_llm/models.py:9: numpy - llms/mixtral/mixtral.py:12: tree_map - llms/mlx_lm/models/dbrx.py:2: Dict, Union - llms/mlx_lm/tuner/trainer.py:5: partial - llms/speculative_decoding/decoder.py:1: dataclass, field - llms/speculative_decoding/decoder.py:2: Optional - llms/speculative_decoding/decoder.py:5: mlx.nn - llms/speculative_decoding/decoder.py:6: numpy - llms/speculative_decoding/main.py:2: glob - llms/speculative_decoding/main.py:3: json - llms/speculative_decoding/main.py:5: Path - llms/speculative_decoding/main.py:8: mlx.nn - llms/speculative_decoding/model.py:6: tree_unflatten - llms/speculative_decoding/model.py:7: AutoTokenizer - llms/tests/test_lora.py:13: yaml_loader - lora/lora.py:14: tree_unflatten - lora/models.py:11: numpy - lora/models.py:3: glob - speechcommands/kwt.py:1: Any - speechcommands/main.py:7: mlx.data - stable_diffusion/stable_diffusion/model_io.py:4: partial - whisper/benchmark.py:5: sys - whisper/test.py:5: subprocess - whisper/whisper/audio.py:6: Optional - whisper/whisper/decoding.py:8: mlx.nn	2024-04-16 07:50:32 -07:00
Angelos Katharopoulos	e55a9e8cb4	Add an SPM detokenizer that doesn't trim initial space (#681 )	2024-04-15 14:15:25 -07:00
Awni Hannun	d3f8e4aee9	Fix argpartition call in Mixtral and other MOES (#676 ) * Update mixtral.py * fix all moes --------- Co-authored-by: yuhai-china <yuhai.china@gmail.com>	2024-04-12 11:00:56 -07:00
Awni Hannun	9c5554d8ee	Use async eval (#670 ) * Use async eval * bump * bump * remove workaround for bfloat cumsum	2024-04-11 13:18:23 -07:00
Nripesh Niketan	0250f6f38e	feat: Update black-pre-commit-mirror to version 24.3.0 (#675 )	2024-04-11 07:28:26 -07:00
devonthomas35	9f472dc985	Update transformers for ⌘-R+ (#668 )	2024-04-11 07:28:12 -07:00
da-z	5a4cad34ef	Always resume downloads (#674 ) * Always resume downloads * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-11 06:52:32 -07:00
Angelos Katharopoulos	eff6690952	Fix CFG for SDXL (#667 )	2024-04-09 06:06:41 -07:00
Angelos Katharopoulos	1278994b56	Add streaming detokenizers (#651 )	2024-04-08 22:36:01 -07:00
Awni Hannun	c68aa3c7c3	Stable lm 2 (#666 ) * stable lm 2 * test and lora * version bump * merge stable models	2024-04-08 14:18:55 -07:00
Awni Hannun	1e2f7f50b6	fix for empty initial string (#665 )	2024-04-08 10:40:05 -07:00
Awni Hannun	c386dd5f5a	Fix for cohere plus (#650 ) * fix for cohere plus * version bump	2024-04-05 14:11:24 -07:00
Awni Hannun	2bd64b78cf	Save lora config (#636 ) * lora config * comments * version bump	2024-04-02 13:52:53 -07:00
Prince Canuma	d661440dbb	Add support for qwen2moe (#640 ) * add sparsemoe block and update decoder logic * update file name to match HF * update name * Code formatting * update gates calculation * add support for Qwen2MoE. * fix pytest * code formatting and fix missing comma in utils * Remove decoder sparse step. Co-authored-by: bozheng-hit <dsoul0621@gmail.com> * remove gate layer anti-quantisation * remove unused argument --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com>	2024-04-02 11:33:29 -07:00
Awni Hannun	78c431dc25	cleanup whisper a little (#639 )	2024-03-30 13:13:58 -07:00
Chime Ogbuji	f6283ef7ce	Configurable LR schedulers (#604 ) * Initial config handler and test * Added means to run from CLI * Update lora config loading and tests * Constrain scheduler config (warmup and minimum LR) for each kind * Update reference to moved schedule_config module * Minor fix * Fix typos * Moved build_schedule and tests * nits in schedule config * flake * fix path --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-29 13:41:10 -07:00
Awni Hannun	b80adbcc3e	DBRX (#628 ) * dbrx * format * format * comments * change scores slightly * remove inadvertant import	2024-03-28 21:03:53 -07:00
Anchen	297a908e3d	fix(mlx-lm): type hints in gguf.py (#621 )	2024-03-26 07:56:01 -07:00
Anchen	0ab01b4626	fix(mlx-lm): sorted probs in top_p implementation. (#610 ) * fix(mlx-lm): the top p imp * chore: address comment	2024-03-25 15:07:55 -07:00
Awni Hannun	bbfcc103d7	cast around lora adapters (#613 )	2024-03-24 19:34:51 -07:00
Awni Hannun	5a52899405	Partially stream de-tokenization (#609 ) * partially stream de-tokenization * don't break full response	2024-03-23 15:32:33 -07:00

1 2 3 4 5 ...

428 Commits