mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-08-22 21:16:55 +08:00

History

mah-chey' \| /ˈmɑː.tʃeɪ/ \| /ˈmat͡ɕɛj/ 5adbd358b5 Add DeciLM/Nemotron-NAS architecture support for MLX This commit introduces native MLX support for DeciLM models, including NVIDIA's Nemotron series that use Neural Architecture Search (NAS) optimizations. Key features: - Support for dummy layers (no-op attention/FFN components) - FFN fusion for improved efficiency - Variable Grouped Query Attention (VGQA) with different KV heads per layer - Block configuration handling for NAS architectures - Full conversion pipeline from HuggingFace to MLX format - Comprehensive test suite Tested with: - nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (Q5: 3.86 tokens/sec on M3 Ultra) - Memory usage: ~175GB peak for 253B model This enables running massive NAS-optimized models on Apple Silicon that were previously incompatible with MLX due to their unique architecture. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>		2025-07-02 05:59:09 +02:00
..
decilm	Add DeciLM/Nemotron-NAS architecture support for MLX	2025-07-02 05:59:09 +02:00
gguf_llm	Made llama and mistral files mypy compatible (#1359 )	2025-04-23 14:23:46 -07:00
llama	Made llama and mistral files mypy compatible (#1359 )	2025-04-23 14:23:46 -07:00
mistral	Quantize embedding / Update quantize API (#680 )	2024-04-18 18:16:10 -07:00
mixtral	Made llama and mistral files mypy compatible (#1359 )	2025-04-23 14:23:46 -07:00
speculative_decoding	Made llama and mistral files mypy compatible (#1359 )	2025-04-23 14:23:46 -07:00
README.md	remove mlx lm (#1353 )	2025-03-18 18:47:55 -07:00

README.md

MOVE NOTICE

The mlx-lm package has moved to a new repo.

The package has been removed from the MLX Examples repo. Send new contributions and issues to the MLX LM repo.