mlx-examples/llms/qwen/README.md

# Qwen

Qwen (通义千问) are a family of language models developed by Alibaba Cloud.[^1]
The architecture of the Qwen models is similar to Llama except for the bias in
the attention layers.

## Setup

First download and convert the model with: 

```sh
python convert.py
```

To generate a 4-bit quantized model, use ``-q``. For a full list of options:

The script downloads the model from Hugging Face. The default model is
`Qwen/Qwen-1_8B`. Check out the [Hugging Face
page](https://huggingface.co/Qwen) to see a list of available models.

By default, the conversion script will make the directory `mlx_model` and save
the converted `weights.npz` and `config.json` there.

## Generate

To generate text with the default prompt:

```sh
python qwen.py
```

If you change the model, make sure to pass the corresponding tokenizer. E.g.,
for Qwen 7B use:

```
python qwen.py --tokenizer  Qwen/Qwen-7B
```

To see a list of options, run:

```sh
python qwen.py --help
```

[^1]: For more details on the model see the official repo of [Qwen](https://github.com/QwenLM/Qwen) and the [Hugging Face](https://huggingface.co/Qwen).
Add Qwen example (#134) * Add qwen model draft * Add readme and requirements for qwen example * Add model and tokenizer options * Fix convert and tokenizer * some updates / style consistency * move to llm subdir * readme nit --------- Co-authored-by: Awni Hannun <awni@apple.com> 2023-12-20 05:06:19 +08:00			`# Qwen`

			`Qwen (通义千问) are a family of language models developed by Alibaba Cloud.[^1]`
			`The architecture of the Qwen models is similar to Llama except for the bias in`
			`the attention layers.`

			`## Setup`

			`First download and convert the model with:`

			```sh
			`python convert.py`
			```
Quantize example (#162) * testing quantization * conversion + quantization working * one config processor * quantization in mistral / nits in llama * args for quantization * llama / mistral conversion in good shape * phi2 quantized * mixtral * qwen conversion 2023-12-22 04:59:37 +08:00
			To generate a 4-bit quantized model, use ``-q``. For a full list of options:

Add Qwen example (#134) * Add qwen model draft * Add readme and requirements for qwen example * Add model and tokenizer options * Fix convert and tokenizer * some updates / style consistency * move to llm subdir * readme nit --------- Co-authored-by: Awni Hannun <awni@apple.com> 2023-12-20 05:06:19 +08:00			`The script downloads the model from Hugging Face. The default model is`
Quantize example (#162) * testing quantization * conversion + quantization working * one config processor * quantization in mistral / nits in llama * args for quantization * llama / mistral conversion in good shape * phi2 quantized * mixtral * qwen conversion 2023-12-22 04:59:37 +08:00			`Qwen/Qwen-1_8B`. Check out the [Hugging Face
			`page](https://huggingface.co/Qwen) to see a list of available models.`
Add Qwen example (#134) * Add qwen model draft * Add readme and requirements for qwen example * Add model and tokenizer options * Fix convert and tokenizer * some updates / style consistency * move to llm subdir * readme nit --------- Co-authored-by: Awni Hannun <awni@apple.com> 2023-12-20 05:06:19 +08:00
Quantize example (#162) * testing quantization * conversion + quantization working * one config processor * quantization in mistral / nits in llama * args for quantization * llama / mistral conversion in good shape * phi2 quantized * mixtral * qwen conversion 2023-12-22 04:59:37 +08:00			By default, the conversion script will make the directory `mlx_model` and save
			the converted `weights.npz` and `config.json` there.
Add Qwen example (#134) * Add qwen model draft * Add readme and requirements for qwen example * Add model and tokenizer options * Fix convert and tokenizer * some updates / style consistency * move to llm subdir * readme nit --------- Co-authored-by: Awni Hannun <awni@apple.com> 2023-12-20 05:06:19 +08:00
			`## Generate`

			`To generate text with the default prompt:`

			```sh
			`python qwen.py`
			```

			`If you change the model, make sure to pass the corresponding tokenizer. E.g.,`
			`for Qwen 7B use:`

			```
			`python qwen.py --tokenizer Qwen/Qwen-7B`
			```

			`To see a list of options, run:`

			```sh
			`python qwen.py --help`
			```

			`[^1]: For more details on the model see the official repo of [Qwen](https://github.com/QwenLM/Qwen) and the [Hugging Face](https://huggingface.co/Qwen).`