mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-11-08 16:10:37 +08:00

Files

dmdaksh 7d7e236061 - Removed unused Python imports (#683 )

- bert/model.py:10: tree_unflatten
  - bert/model.py:2: dataclass
  - bert/model.py:8: numpy
  - cifar/resnet.py:6: Any
  - clip/model.py:15: tree_flatten
  - clip/model.py:9: Union
  - gcn/main.py:8: download_cora
  - gcn/main.py:9: cross_entropy
  - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
  - llms/gguf_llm/models.py:9: numpy
  - llms/mixtral/mixtral.py:12: tree_map
  - llms/mlx_lm/models/dbrx.py:2: Dict, Union
  - llms/mlx_lm/tuner/trainer.py:5: partial
  - llms/speculative_decoding/decoder.py:1: dataclass, field
  - llms/speculative_decoding/decoder.py:2: Optional
  - llms/speculative_decoding/decoder.py:5: mlx.nn
  - llms/speculative_decoding/decoder.py:6: numpy
  - llms/speculative_decoding/main.py:2: glob
  - llms/speculative_decoding/main.py:3: json
  - llms/speculative_decoding/main.py:5: Path
  - llms/speculative_decoding/main.py:8: mlx.nn
  - llms/speculative_decoding/model.py:6: tree_unflatten
  - llms/speculative_decoding/model.py:7: AutoTokenizer
  - llms/tests/test_lora.py:13: yaml_loader
  - lora/lora.py:14: tree_unflatten
  - lora/models.py:11: numpy
  - lora/models.py:3: glob
  - speechcommands/kwt.py:1: Any
  - speechcommands/main.py:7: mlx.data
  - stable_diffusion/stable_diffusion/model_io.py:4: partial
  - whisper/benchmark.py:5: sys
  - whisper/test.py:5: subprocess
  - whisper/whisper/audio.py:6: Optional
  - whisper/whisper/decoding.py:8: mlx.nn

2024-04-16 07:50:32 -07:00

generate.py

Example reading directly from gguf file (#222 )

2024-01-23 15:41:54 -08:00

models.py

- Removed unused Python imports (#683 )

2024-04-16 07:50:32 -07:00

README.md

Example reading directly from gguf file (#222 )

2024-01-23 15:41:54 -08:00

requirements.txt

Switch to fast RMS/LN Norm (#603 )

2024-03-23 07:13:51 -07:00

utils.py

Example reading directly from gguf file (#222 )

2024-01-23 15:41:54 -08:00

README.md

LLMs in MLX with GGUF

An example generating text using GGUF format models in MLX.¹

Note

MLX is able to read most quantization formats from GGUF directly. However, only a few quantizations are supported directly: Q4_0, Q4_1, and Q8_0. Unsupported quantizations will be cast to float16.

Setup

Install the dependencies:

pip install -r requirements.txt

Run

Run with:

python generate.py \
  --repo <hugging_face_repo> \
  --gguf <file.gguf> \
  --prompt "Write a quicksort in Python"

For example, to generate text with Mistral 7B use:

python generate.py \
  --repo TheBloke/Mistral-7B-v0.1-GGUF \
  --gguf mistral-7b-v0.1.Q8_0.gguf \
  --prompt "Write a quicksort in Python"

Run python generate.py --help for more options.

Models that have been tested and work include:

TheBloke/Mistral-7B-v0.1-GGUF, for quantized models use:
- mistral-7b-v0.1.Q8_0.gguf
- mistral-7b-v0.1.Q4_0.gguf
TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF, for quantized models use:
- tinyllama-1.1b-chat-v1.0.Q8_0.gguf
- tinyllama-1.1b-chat-v1.0.Q4_0.gguf

For more information on GGUF see the documentation. ↩︎