mlx-examples/lora/convert.py

# Copyright © 2023-2024 Apple Inc.

import argparse
import copy

import mlx.core as mx
import mlx.nn as nn
import models
import utils
from mlx.utils import tree_flatten


def quantize(weights, config, args):
    quantized_config = copy.deepcopy(config)

    # Load the model:
    model = models.Model(models.ModelArgs.from_dict(config))
    model.load_weights(list(weights.items()))

    # Quantize the model:
    nn.quantize(
        model,
        args.q_group_size,
        args.q_bits,
    )

    # Update the config:
    quantized_config["quantization"] = {
        "group_size": args.q_group_size,
        "bits": args.q_bits,
    }
    quantized_weights = dict(tree_flatten(model.parameters()))

    return quantized_weights, quantized_config


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Convert Hugging Face model to MLX format"
    )
    parser.add_argument(
        "--hf-path",
        type=str,
        help="Path to the Hugging Face model.",
    )
    parser.add_argument(
        "--mlx-path",
        type=str,
        default="mlx_model",
        help="Path to save the MLX model.",
    )
    parser.add_argument(
        "-q",
        "--quantize",
        help="Generate a quantized model.",
        action="store_true",
    )
    parser.add_argument(
        "--q-group-size",
        help="Group size for quantization.",
        type=int,
        default=64,
    )
    parser.add_argument(
        "--q-bits",
        help="Bits per weight for quantization.",
        type=int,
        default=4,
    )
    parser.add_argument(
        "--dtype",
        help="Type to save the parameters, ignored if -q is given.",
        type=str,
        choices=["float16", "bfloat16", "float32"],
        default="float16",
    )
    parser.add_argument(
        "--upload-name",
        help="The name of model to upload to Hugging Face MLX Community",
        type=str,
        default=None,
    )

    args = parser.parse_args()

    print("[INFO] Loading")
    weights, config, tokenizer = utils.fetch_from_hub(args.hf_path)

    dtype = mx.float16 if args.quantize else getattr(mx, args.dtype)
    weights = {k: v.astype(dtype) for k, v in weights.items()}
    if args.quantize:
        print("[INFO] Quantizing")
        weights, config = quantize(weights, config, args)

    utils.save_model(args.mlx_path, weights, tokenizer, config)
    if args.upload_name is not None:
        utils.upload_to_hub(args.mlx_path, args.upload_name, args.hf_path)
Switch to fast RMS/LN Norm (#603) * use nn.RMSNorm, use sdpa, cleanup * bump mlx versions * minor update * use fast layer norm * version bump * update requirement for whisper * update requirement for gguf 2024-03-23 22:13:51 +08:00			`# Copyright © 2023-2024 Apple Inc.`
add copyright in source 2023-12-01 03:08:53 +08:00
lora 2023-11-30 06:14:11 +08:00			`import argparse`
Qlora (#219) qlora 2024-01-05 13:05:59 +08:00			`import copy`
lora 2023-11-30 06:14:11 +08:00
Qlora (#219) qlora 2024-01-05 13:05:59 +08:00			`import mlx.core as mx`
			`import mlx.nn as nn`
Switch to fast RMS/LN Norm (#603) * use nn.RMSNorm, use sdpa, cleanup * bump mlx versions * minor update * use fast layer norm * version bump * update requirement for whisper * update requirement for gguf 2024-03-23 22:13:51 +08:00			`import models`
Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`import utils`
			`from mlx.utils import tree_flatten`
Qlora (#219) qlora 2024-01-05 13:05:59 +08:00

			`def quantize(weights, config, args):`
			`quantized_config = copy.deepcopy(config)`

			`# Load the model:`
Switch to fast RMS/LN Norm (#603) * use nn.RMSNorm, use sdpa, cleanup * bump mlx versions * minor update * use fast layer norm * version bump * update requirement for whisper * update requirement for gguf 2024-03-23 22:13:51 +08:00			`model = models.Model(models.ModelArgs.from_dict(config))`
Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`model.load_weights(list(weights.items()))`
Qlora (#219) qlora 2024-01-05 13:05:59 +08:00
			`# Quantize the model:`
one more quant fix (#708) 2024-04-23 09:12:52 +08:00			`nn.quantize(`
chore(lora): support mixtral in lora example (#343) 2024-01-20 22:07:45 +08:00			`model,`
			`args.q_group_size,`
			`args.q_bits,`
			`)`
Qlora (#219) qlora 2024-01-05 13:05:59 +08:00
			`# Update the config:`
			`quantized_config["quantization"] = {`
			`"group_size": args.q_group_size,`
			`"bits": args.q_bits,`
			`}`
			`quantized_weights = dict(tree_flatten(model.parameters()))`

			`return quantized_weights, quantized_config`

lora 2023-11-30 06:14:11 +08:00
generalize lora finetuning for llama and mistral 2023-12-10 06:13:55 +08:00			`if __name__ == "__main__":`
			`parser = argparse.ArgumentParser(`
Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`description="Convert Hugging Face model to MLX format"`
generalize lora finetuning for llama and mistral 2023-12-10 06:13:55 +08:00			`)`
			`parser.add_argument(`
Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`"--hf-path",`
generalize lora finetuning for llama and mistral 2023-12-10 06:13:55 +08:00			`type=str,`
Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`help="Path to the Hugging Face model.",`
generalize lora finetuning for llama and mistral 2023-12-10 06:13:55 +08:00			`)`
			`parser.add_argument(`
Qlora (#219) qlora 2024-01-05 13:05:59 +08:00			`"--mlx-path",`
generalize lora finetuning for llama and mistral 2023-12-10 06:13:55 +08:00			`type=str,`
Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`default="mlx_model",`
			`help="Path to save the MLX model.",`
generalize lora finetuning for llama and mistral 2023-12-10 06:13:55 +08:00			`)`
Qlora (#219) qlora 2024-01-05 13:05:59 +08:00			`parser.add_argument(`
			`"-q",`
			`"--quantize",`
			`help="Generate a quantized model.",`
			`action="store_true",`
			`)`
			`parser.add_argument(`
			`"--q-group-size",`
			`help="Group size for quantization.",`
			`type=int,`
			`default=64,`
			`)`
			`parser.add_argument(`
			`"--q-bits",`
			`help="Bits per weight for quantization.",`
			`type=int,`
			`default=4,`
			`)`
Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`parser.add_argument(`
			`"--dtype",`
			`help="Type to save the parameters, ignored if -q is given.",`
			`type=str,`
			`choices=["float16", "bfloat16", "float32"],`
			`default="float16",`
			`)`
			`parser.add_argument(`
			`"--upload-name",`
			`help="The name of model to upload to Hugging Face MLX Community",`
			`type=str,`
			`default=None,`
fix: Unsupported BFloat16 Data Type Issue with MPS Backend 2023-12-08 16:19:35 +08:00			`)`
lora 2023-11-30 06:14:11 +08:00
Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`args = parser.parse_args()`
fix use for llama 2 from meta (#144) 2023-12-19 11:33:17 +08:00
Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`print("[INFO] Loading")`
			`weights, config, tokenizer = utils.fetch_from_hub(args.hf_path)`
Qlora (#219) qlora 2024-01-05 13:05:59 +08:00
Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`dtype = mx.float16 if args.quantize else getattr(mx, args.dtype)`
			`weights = {k: v.astype(dtype) for k, v in weights.items()}`
Qlora (#219) qlora 2024-01-05 13:05:59 +08:00			`if args.quantize:`
			`print("[INFO] Quantizing")`
			`weights, config = quantize(weights, config, args)`

Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`utils.save_model(args.mlx_path, weights, tokenizer, config)`
			`if args.upload_name is not None:`
			`utils.upload_to_hub(args.mlx_path, args.upload_name, args.hf_path)`