mlx-examples/llms/export
Awni Hannun 26e2d97e27 comment
2025-01-09 15:54:49 -08:00
..
third_party comment 2025-01-09 15:54:49 -08:00
.gitignore export and run llama in C++ 2025-01-08 17:07:02 -08:00
CMakeLists.txt some cleanup 2025-01-09 13:11:21 -08:00
export.py export and run llama in C++ 2025-01-08 17:07:02 -08:00
main.cpp export and run llama in C++ 2025-01-08 17:07:02 -08:00
mlxlm.cpp clang format 2025-01-09 15:21:17 -08:00
mlxlm.h comment 2025-01-09 15:54:49 -08:00
README.md export and run llama in C++ 2025-01-08 17:07:02 -08:00
test.cpp export and run llama in C++ 2025-01-08 17:07:02 -08:00
tokenizer.cpp comment 2025-01-09 15:54:49 -08:00
tokenizer.h clang format 2025-01-09 15:21:17 -08:00

Export LLMs to C++

Export language model inference from Python to run directly in C++.

To run, first install the requirements:

pip install -U mlx-lm

Then generate text from Python with:

python export.py generate "How tall is K2?"

To export the generation function run:

python export.py export

Then build the C++ code (requires CMake):

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build

And run the generation from C++ with:

./build/main lama3.1-instruct-4bit "How tall is K2?"