refactor(qwen): moving qwen into mlx-lm (#312)

* refactor(qwen): moving qwen into mlx-lm * chore: update doc * chore: fix type hint * add qwen model support in convert * chore: fix doc * chore: only load model in quantize_model * chore: make the convert script only copy tokenizer files instead of load it and save * chore: update docstring * chore: remove unnecessary try catch * chore: clean up for tokenizer and update transformers 4.37 * nits in README --------- Co-authored-by: Awni Hannun <awni@apple.com>
2025-12-16 02:08:55 +08:00 · 2024-01-22 15:00:07 -08:00
parent de15532da8
commit 30be4c4734
8 changed files with 80 additions and 309 deletions
--- a/llms/README.md
+++ b/llms/README.md
@@ -102,11 +102,26 @@ Here are a few examples of Hugging Face models that work with this example:
 - [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)
 - [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)
 - [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
+- [Qwen/Qwen-7B](https://huggingface.co/Qwen/Qwen-7B)

 Most
 [Mistral](https://huggingface.co/models?library=transformers,safetensors&other=mistral&sort=trending),
 [Llama](https://huggingface.co/models?library=transformers,safetensors&other=llama&sort=trending),
-[Phi-2](https://huggingface.co/models?library=transformers,safetensors&other=phi&sort=trending)
+[Phi-2](https://huggingface.co/models?library=transformers,safetensors&other=phi&sort=trending),
 and
 [Mixtral](https://huggingface.co/models?library=transformers,safetensors&other=mixtral&sort=trending)
 style models should work out of the box.
+
+For
+[Qwen](https://huggingface.co/models?library=transformers,safetensors&other=qwen&sort=trending)
+style models, you must enable the `trust_remote_code` option and specify the
+`eos_token`. This ensures the tokenizer works correctly.  You can do this by
+passing `--trust-remote-code` and `--eos-token "<|endoftext|>"` in the command
+line, or by setting these options in the Python API:
+
+```python
+model, tokenizer = load(
+    "qwen/Qwen-7B",
+    tokenizer_config={"eos_token": "<|endoftext|>", "trust_remote_code": True},
+)
+```