 31ddbd7806
			
		
	
	31ddbd7806
	
	
	
		
			
			* feat: add example for deepseek coder * chore: remove hardcoded rope_scaling_factor * feat: add quantization support * chore: update readme * chore: clean up the rope scalling factor param in create cos sin theta * feat: add repetition_penalty * style /consistency changes to ease future integration * nits in README * one more typo --------- Co-authored-by: Awni Hannun <awni@apple.com>
Qwen
Qwen (通义千问) are a family of language models developed by Alibaba Cloud.1 The architecture of the Qwen models is similar to Llama except for the bias in the attention layers.
Setup
First download and convert the model with:
python convert.py
To generate a 4-bit quantized model, use -q. For a full list of options:
The script downloads the model from Hugging Face. The default model is
Qwen/Qwen-1_8B. Check out the Hugging Face
page to see a list of available models.
By default, the conversion script will make the directory mlx_model and save
the converted weights.npz and config.json there.
Generate
To generate text with the default prompt:
python qwen.py
If you change the model, make sure to pass the corresponding tokenizer. E.g., for Qwen 7B use:
python qwen.py --tokenizer  Qwen/Qwen-7B
To see a list of options, run:
python qwen.py --help
- 
For more details on the model see the official repo of Qwen and the Hugging Face. ↩︎