mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-06-25 09:51:19 +08:00

History

Anchen 31ddbd7806 add deepseek coder example (#172 ) * feat: add example for deepseek coder * chore: remove hardcoded rope_scaling_factor * feat: add quantization support * chore: update readme * chore: clean up the rope scalling factor param in create cos sin theta * feat: add repetition_penalty * style /consistency changes to ease future integration * nits in README * one more typo --------- Co-authored-by: Awni Hannun <awni@apple.com>		2023-12-28 21:42:22 -08:00
..
convert.py	add deepseek coder example (#172 )	2023-12-28 21:42:22 -08:00
qwen.py	Quantize example (#162 )	2023-12-21 12:59:37 -08:00
README.md	Quantize example (#162 )	2023-12-21 12:59:37 -08:00
requirements.txt	Add Qwen example (#134 )	2023-12-19 13:06:19 -08:00

README.md

Qwen

Qwen (通义千问) are a family of language models developed by Alibaba Cloud.¹ The architecture of the Qwen models is similar to Llama except for the bias in the attention layers.

Setup

First download and convert the model with:

python convert.py

To generate a 4-bit quantized model, use -q. For a full list of options:

The script downloads the model from Hugging Face. The default model is Qwen/Qwen-1_8B. Check out the Hugging Face page to see a list of available models.

By default, the conversion script will make the directory mlx_model and save the converted weights.npz and config.json there.

Generate

To generate text with the default prompt:

python qwen.py

If you change the model, make sure to pass the corresponding tokenizer. E.g., for Qwen 7B use:

python qwen.py --tokenizer  Qwen/Qwen-7B

To see a list of options, run:

python qwen.py --help

For more details on the model see the official repo of Qwen and the Hugging Face. ↩︎