mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-08-21 12:06:51 +08:00

History

mah-chey' \| /ˈmɑː.tʃeɪ/ \| /ˈmat͡ɕɛj/ 5adbd358b5 Add DeciLM/Nemotron-NAS architecture support for MLX This commit introduces native MLX support for DeciLM models, including NVIDIA's Nemotron series that use Neural Architecture Search (NAS) optimizations. Key features: - Support for dummy layers (no-op attention/FFN components) - FFN fusion for improved efficiency - Variable Grouped Query Attention (VGQA) with different KV heads per layer - Block configuration handling for NAS architectures - Full conversion pipeline from HuggingFace to MLX format - Comprehensive test suite Tested with: - nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (Q5: 3.86 tokens/sec on M3 Ultra) - Memory usage: ~175GB peak for 253B model This enables running massive NAS-optimized models on Apple Silicon that were previously incompatible with MLX due to their unique architecture. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-07-02 05:59:09 +02:00
..
test_decilm.py	Add DeciLM/Nemotron-NAS architecture support for MLX	2025-07-02 05:59:09 +02:00

mah-chey' | /ˈmɑː.tʃeɪ/ | /ˈmat͡ɕɛj/ 5adbd358b5 Add DeciLM/Nemotron-NAS architecture support for MLX

This commit introduces native MLX support for DeciLM models, including NVIDIA's
Nemotron series that use Neural Architecture Search (NAS) optimizations.

Key features:
- Support for dummy layers (no-op attention/FFN components)
- FFN fusion for improved efficiency
- Variable Grouped Query Attention (VGQA) with different KV heads per layer
- Block configuration handling for NAS architectures
- Full conversion pipeline from HuggingFace to MLX format
- Comprehensive test suite

Tested with:
- nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (Q5: 3.86 tokens/sec on M3 Ultra)
- Memory usage: ~175GB peak for 253B model

This enables running massive NAS-optimized models on Apple Silicon that were
previously incompatible with MLX due to their unique architecture.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-07-02 05:59:09 +02:00

test_decilm.py

Add DeciLM/Nemotron-NAS architecture support for MLX

2025-07-02 05:59:09 +02:00