Export LLMs to C++

Export language model inference from Python to run directly in C++.

To run, first install the requirements:

pip install -U mlx-lm

Then generate text from Python with:

python export.py generate "How tall is K2?"

To export the generation function run:

python export.py export

Then build the C++ code (requires CMake):

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build

And run the generation from C++ with:

./build/main lama3.1-instruct-4bit "How tall is K2?"