2023-12-18 15:29:06 +08:00
# Qwen
2023-12-20 04:58:59 +08:00
Qwen (通义千问) is a language model developed by Alibaba Cloud.[^1] The
architecture of Qwen is similar to Llama except for the bias in the attention
layers.
2023-12-18 15:29:06 +08:00
## Setup
2023-12-20 04:58:59 +08:00
First download and convert the model with:
2023-12-18 15:29:06 +08:00
```sh
python convert.py
```
2023-12-20 04:58:59 +08:00
The script downloads the model from Hugging Face. The default model is
`Qwen/Qwen-1_8B` . Check out the [Hugging Face page ](https://huggingface.co/Qwen ) to see a list of available models.
2023-12-18 15:29:06 +08:00
2023-12-20 04:58:59 +08:00
The conversion script will make the `weights.npz` and `params.json` files in
the working directory.
2023-12-18 15:29:06 +08:00
## Generate
2023-12-20 04:58:59 +08:00
To generate text with the default prompt:
2023-12-18 15:29:06 +08:00
```sh
python qwen.py
```
2023-12-20 04:58:59 +08:00
If you change the model, make sure to pass the corresponding tokenizer. E.g.,
for Qwen 7B use:
```
python qwen.py --tokenizer Qwen/Qwen-7B
```
2023-12-18 15:29:06 +08:00
To see a list of options, run:
```sh
python qwen.py --help
```
[^1]: For more details on the model see the official repo of [Qwen ](https://github.com/QwenLM/Qwen ) and the [Hugging Face ](https://huggingface.co/Qwen ).