mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-12-16 02:08:55 +08:00
Change to enable saving the kv-cache as a safetensors file after a text completion; after generate step has finished creating all the tokens, the key values cache is made into a dict and saved using mx.save_safetensors to a user-specified file location; similar to cache_prompt.
Generate Text with MLX and 🤗 Hugging Face
This an example of large language model text generation that can pull models from the Hugging Face Hub.
For more information on this example, see the README in the parent directory.
This package also supports fine tuning with LoRA or QLoRA. For more information see the LoRA documentation.