MiniCPM implementation (#685)

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-12-11 06:55:00 +08:00

* Added support for the MiniCPM architecture

* Added support for the MiniCPM architecture

* Updated utils.py and LORA.md

* Updated utils.py and LORA.md

* Update implementation details for MiniCPM architecture

* Cleaning up

* fixed the missing lm.head layer problem

* Refactor Model class to dynamically handle tied and untied word embeddings

* Quick update

* added a dynamic rope scaling base calucaltion

* Added support for the MiniCPM architecture

* Added support for the MiniCPM architecture

* Updated utils.py and LORA.md

* Updated utils.py and LORA.md

* Update implementation details for MiniCPM architecture

* Cleaning up

* fixed the missing lm.head layer problem

* Refactor Model class to dynamically handle tied and untied word embeddings

* added a dynamic rope scaling base calucaltion

* quick fix and clean up

* clean up again

* removed the MiniCPMNorm class as its not used

* forgot something, sorry

* format

* version bump

---------

Co-authored-by: Awni Hannun <awni@apple.com>

This commit is contained in:

Gökdeniz Gülmez

2024-04-26 00:29:28 +02:00

committed by

GitHub

parent 685012c2ad

commit 2c1c9e9024

4 changed files with 251 additions and 22 deletions

									
										1

llms/mlx_lm/tuner/utils.py
									
												View File
												
				@@ -77,6 +77,7 @@ def linear_to_lora_layers(

				        "gemma",

				        "starcoder2",

				        "cohere",

				        "minicpm",

				    ]:

				        keys = set(["self_attn.q_proj", "self_attn.v_proj"])

				        if model.model_type == "mixtral":

MiniCPM implementation (#685)

1 llms/mlx_lm/tuner/utils.py Unescape Escape View File

1

llms/mlx_lm/tuner/utils.py

View File