Commit Graph

188 Commits

Author SHA1 Message Date
Awni Hannun
14fe868825
version (#570) 2024-03-13 10:09:36 -07:00
Prince Canuma
76c3244cc5
Add support for Cohere's Command-R (#565)
* initial commit for command-R

* update mlp, layernorm, lm_head and model args

* add custom layernorm

* add default to tie_word_embeddings

* add layernorm weight type and refactor

* update layernorm (bias conditional) in model/layers

* fix layer norm use traditional rope

* add test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-13 07:03:36 -07:00
Anchen
3535408c99
chore(mlx-lm): fix tie_word_embeddings for qwen2 (#566)
* chore: fix tie_word_embeddings for qwen2

* chore: default tie_word_embeddings to True
2024-03-12 21:34:32 -07:00
Awni Hannun
39084e81c2
Some improvements to LoRA (#528)
* set cache_limit

* remove set cache_limit

* cleanup

* add gradient checkpointing

* fix sort

* mokey patch call for checkpoint

* fix example config
2024-03-12 20:02:03 -07:00
Chime Ogbuji
e56d9015ef
LoRA on all linear transformer block layers (#546)
* Add --lora-all-linear option to apply LoRa to all linear transfer block layers

* Moved to YAML config and added specification of rank & alpha

* nits in conifg, more tests

* nit

* run tests for prs

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-12 07:37:40 -07:00
Awni Hannun
ad3cf5ed98
dropout 0 as default (#549) 2024-03-08 13:07:10 -08:00
Chime Ogbuji
8c2cf665ed
YAML configuration for mlx_lm.lora (#503)
* Convert mlx_lm.lora to use YAML configuration

* pre-commit run fixes

* Fix loading of config file

* Remove invalid YAML from doc

* Update command-line options and YAML parameter overriding, per feedback in #503

* Minor wording change

* Positional argument

* Moved config to a (-c/--config) flag

* Removed CLI option defaults (since CLI options take precedence and their defaults are in CONFIG_DEFAULTS)

* pre-commit format updates

* Fix handling of CLI option defaults

* Prevent None values of unspecified CLI options from overwriting values from CONFIG_DEFAULTS

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-08 07:57:52 -08:00
Awni Hannun
8b05bb6d18
[mlx-lm] Use sdpa in llama / mistral model (#515)
* use sdpa

* update a few more models

* version

* fix stablelm type
2024-03-07 17:41:23 -08:00
Awni Hannun
7cdd1b69ac
Enable unit testing in Circle and start some MLX LM tests (#545)
* add a few tests for mlx lm

* add a few tests for mlx lm

* add a few tests for mlx lm

* more tests / cleanup
2024-03-07 09:31:57 -08:00
Anchen
8a178f8716
chore: enable tie_word_embeddings config for qwen2 (#544) 2024-03-07 06:11:35 -08:00
Y4hL
b8e5eda4fd
Refactoring of mlx_lm example (#501)
* Use named tuple from typing for typehints

* Add type hints

* Simplify expression

* Type hint fix

* Improved do_POST logic

Use a map of endpoints to methods to reduce redundancy in code

* Fix format

* Improve redundancy

Call method dynamically instead of writing out all arguments twice

* Send response instead of returning

* Fix typo

* Revert change

* Make adapter_file as Optional

* Mark formatter as optional

* format

* Create message generator

Store response data that stays static for the duration of the response inside of the object:

system_fingerprint
request_id
object_type
requested_model

Created a message generator, that dynamically creates messages from the metadata stored inside of the object, and the data from the model pipeline

* Remove leftover

* Update parameters to reflect new object structure

No longer pass all arguments between functions, but use the stores values inside of the object

* Parse body before calling request specific methods

* Call super init

* Update server.py

* Fixed outdated documentation parameter name

* Add documentation

* Fix sending headers twice

During testing I found that when using the streaming option, headers have always been sent twice. This should fix that

* Simplify streaming code by using guard clauses

Don't wrap wfile writes in try blocks, the server class has its own try block to prevent crashing

* Bug fix

* Use Content-Length header

Let the completion type specific methods finish sending the headers. This allows us to send the Content-Length header as the model returns a completion.

* Update utils.py

* Add top_p documentation

* Type hint model and tokenizer as required

* Use static system fingerprint

System fingerprint now stays the same across requests

* Make type hint more specific

* Bug Fix

Supplying less than 2 models to merge would raise ValueError and calls len on unbound "models". Should be "model_paths" instead.

Mark upload_repo as optional

* Move more of the shared code into do_POST

Processing stop_id_sequences is done no matter the request endpoint or type, move it into the shared section. handle_ methods now just return the prompt in mx.array form.

* Store stop_id_sequences as lists instead of np

During testing I found that letting the tokenizer return values as python lists and converting them to mlx arrays was around 20% faster than having the tokenizer convert them to np, and from np to mlx. This allows makes it so numpy no longer needs to be imported.

* Update stop_id_sequences docs

* Turn if check to non-inclusive

Only continue if buffer is smaller

* Documentation fix

* Cleared method names

Instead of handle_stream and generate_competion, we should name it handle_completion.

Instead of handle_completions and handle_chat_completions, we should name it handle_text_completions, since both are completions, calling it text completions should make it more descriptive

* Make comment clearer

* fix format

* format
2024-03-06 06:24:31 -08:00
Madroid Ma
710c552731
add huggingface repo url print (#534) 2024-03-05 21:51:31 -08:00
Muhtasham Oblokulov
5de7c2ac33
Add tips on porting LLMs from HuggingFace (#523)
* Add tips on porting LLMs from HuggingFace

* Add CONTRIBUTING.md  to mlx-examples-llms

* Refactor imports and update comment in starcoder2.py

* Update llms/mlx_lm/models/starcoder2.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* nits

* nits

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-05 17:43:15 -08:00
Prince Canuma
3fdf85e79d
Starcoder2: Update config and change GQA to use repeat (#520)
* update config

* change gqa to use repeat instead of concante

* contribution
2024-03-03 06:12:03 -08:00
Anchen
1e3daea3bb
chore(mlx-lm): add missing model_type for starcoder2 (#522) 2024-03-03 06:07:45 -08:00
Anchen
3655bfc3bd
chore(mlx-lm): fix broken server.py script (#519) 2024-03-03 06:04:54 -08:00
Muhtasham Oblokulov
81e2a80026
Add Starcoder 2 (#502)
* Add Starcoder2 model and update utils.py

* Refactor model arguments and modules in starcoder2.py

* Refactor FeedForward class to MLP in starcoder2.py

* Fix typo

* pre-commit

* Refactor starcoder2.py: Update model arguments and modules

* Fix LM head and MLP layers

* Rename  input layer norm

* Update bias in linear layers

* Refactor token embeddings in Starcoder2Model

* Rename to standard HF attention layer name

* Add LayerNorm

* Add transposed token embeddings (like in Gemma)

* Refactor MLP and TransformerBlock classes

* Add tie_word_embeddings option to ModelArgs and update Model implementation

* Add conditional check for tying word embeddings in Starcoder2Model

* Fix bias in lm_head linear layer

* Remove unused LayerNorm in stablelm

* Update transformers dependency to use GitHub repository

* fix lm head bug, revert transformer req

* Update RoPE initialization in Attention class

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-02 19:39:23 -08:00
Miller Liang
5b1043a458
llms: convert() add 'revision' argument (#506)
* llms: convert() add 'revision' argument

* Update README.md

* Update utils.py

* Update README.md

* Update llms/mlx_lm/utils.py

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-03-02 06:28:26 -08:00
Ashish
261f1280f6
Update to StableLM code (#514)
* StableLM now part of Transformers as stablelm rather than stablelm_epoch; changed config to match new changes

* removing old file

* reference new stablelm
2024-03-01 09:53:38 -08:00
Sugato Ray
3acc1ec84e
fix: string indentation with textwrap.dedent (#510)
* fix: string indentation with textwrap.dedent

* update formatting
2024-02-29 22:23:01 -08:00
Madroid Ma
f03c8a7b44
LoRA: adapter file Support path information (#505)
* LoRA: adapter file Support path information

* fix pre-commit lint

* from os.path to pathlib.Path

* Update llms/mlx_lm/tuner/trainer.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* rename check_checkpoints_path to checkpoints_path

* fix pre-commit lint

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-02-29 22:20:49 -08:00
Awni Hannun
ae48563378
Remove gc (#509)
* remove gc

* remove debug
2024-02-29 09:40:04 -08:00
Anchen
13794a05da
chore(mlx-lm): add adapter support in generate.py (#494)
* chore(mlx-lm): add adapter support in generate.py

* chore: remove generate from lora.py and raise error to let user use mlx_lm.generate instead
2024-02-28 07:49:25 -08:00
Alex Ishida
ab0f1dd1b6
Add metadata when saving safetensors (#496)
* Add metadata when saving safetensors

Add metadata format="pt" for safetensors so that model's are accessible to `transformers` users as well.

* save with metadata format mlx

Save the model weights with metadata format of "mlx".

* Updated llms/mlx_lm/generate.py
2024-02-28 07:29:00 -08:00
Y4hL
ea92f623d6
Prevent llms/mlx_lm from serving the local directory as a webserver (#498)
* Don't serve local directory

BaseHTTPRequestHandler serves the current directory by default. Definitely not intended behaviour. Remove the "do_HEAD" and "do_GET" methods.

* Fix typo in method name

I assume hanlde_stream was intended to be called handle_stream

* Fix outdated typehint

load_model returns nn.Module, however fetch_from_hub was not updated to reflect the change

* Add some more type hints

* Add warnings for using in prod

Add a warning to README and runtime, discouraging use in production. The warning is the same as on the python docs for HTTPServer https://docs.python.org/3/library/http.server.html

* format

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-02-27 19:40:42 -08:00
Y4hL
676e574eff
Add missing import (#497)
* Add missing import

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-02-27 13:27:08 -08:00
Awni Hannun
95f82e67a2
Fix import warning (#479)
* fix import warning
* fix version import
* remove api, move convert to utils
* also update circle to run external PRs
2024-02-27 08:47:56 -08:00
Anchen
82f3f31d93
chore(mlx-lm): refactor server.py to utilize generate_step from utils for consistency (#491)
* chore(mlx-lm): refactor server.py to utilize generate_step from utils for consistency

* chore(mlx-lm): update server doc

* chore: remove unused generate func
2024-02-27 06:25:24 -08:00
Anchen
19a21bfce4
chore: add /v1/completions for server (#489) 2024-02-26 20:59:33 -08:00
Madroid Ma
e5dfef5d9a
LoRA: Extract the run function for easy use in scripts file (#482)
* LoRA: Extract the run_lora function for easy use in scripts

* LoRA: run_lora function adds a TrainingCallback pass.

* LoRA: change run_lora to run
2024-02-26 19:35:04 -08:00
peterjc123
ccb278bcbd
Add top-p sampling for text generation (#486) 2024-02-26 06:18:11 -08:00
Awni Hannun
f24edfa9dc
[mlx-lm] Add precompiled normalizations (#451)
* add precompiled normalizations

* nits
2024-02-22 12:40:55 -08:00
Awni Hannun
97c09a863d
bump version and include in package (#475) 2024-02-21 09:40:36 -08:00
Awni Hannun
ab9172baac
Gemma support (#474)
* gemma support

* format

* lora support for gemma
2024-02-21 08:47:13 -08:00
Juan B. Rodriguez
838990b33b
fix: remove custom rope (#470) 2024-02-20 13:46:16 -08:00
Angelos Katharopoulos
dc4f2e0a6b
Lazy loading models for faster convert and merge (#462) 2024-02-20 13:36:55 -08:00
Madroid Ma
8eee4399f4
LoRA: Add printing and callbacks for learning rate during training (#457)
* LoRA:Refactor TrainingCallback to enhance flexibility and extensibility

This commit refactors the TrainingCallback class to accept a dictionary parameter for both on_train_loss_report and on_val_loss_report methods. By switching from multiple parameters to a single dict parameter, this change significantly improves the class's flexibility and makes it easier to extend with new training or validation metrics in the future without altering the method signatures. This approach simplifies the addition of new information to be logged or processed and aligns with best practices for scalable and maintainable code design.

* LoRA: Add printing and callbacks for learning rate during training
2024-02-20 13:07:21 -08:00
Awni Hannun
20b39c7fee
update protobuf (#467) 2024-02-20 11:46:36 -08:00
Awni Hannun
8fd953ee2b
Support for slerp merging models (#455)
* support for slerp merging models

* docs

* update docs

* format'
2024-02-19 20:37:15 -08:00
Anchen
88458c4e40
feat(mlx-lm): add openAI like api server (#429)
* feat(mlx-lm): add openAI like api server

* chore: fix sse format

* chore: add top_p support

* chore: fix the load import

* chore: add workground for missing space in stream decoding

* chore: fix typo

* chore: add error handling for streaming

* chore: using slicing instead of replace

* chore: set host, port via args and improve handle stream token logic

* chore: refactor stop sequence function

* chore: rename stopping_criteria

* fix: unable to load kernel contiguous_scan_inclusive_sum_bfloat16_bfloat16

* chore: fix the streaming unicode issue

* Update llms/mlx_lm/server.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* refacotr: move stopping_criteria out of generate func

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-02-18 14:01:28 -08:00
devonthomas35
cc671cd1c7
Mixtral: Fix non-default arg follows default exception (#450)
Mixtral models throw the following exception
```
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/generate.py", line 119, in <module>
    main(args)
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/generate.py", line 96, in main
    model, tokenizer = load(args.model, tokenizer_config=tokenizer_config)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/utils.py", line 278, in load
    model = load_model(model_path)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/utils.py", line 221, in load_model
    model_class, model_args_class = _get_classes(config=config)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/utils.py", line 46, in _get_classes
    arch = importlib.import_module(f"mlx_lm.models.{model_type}")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/models/mixtral.py", line 11, in <module>
    @dataclass
     ^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 1230, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 1220, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 1027, in _process_class
    _init_fn(all_init_fields,
  File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 545, in _init_fn
    raise TypeError(f'non-default argument {f.name!r} '
TypeError: non-default argument 'model_type' follows default argument
```
2024-02-18 13:30:26 -08:00
Ivan Fioravanti
b05907c87e
Change argument name in lora.py (#453)
The argument name "--max_seq_length" was updated to "--max-seq-length" in the code to maintain a consistent naming convention across the program.
2024-02-18 06:04:49 -08:00
Awni Hannun
e4d5630698
Basic CircleCI (#449)
* basic style checks for circleci

* format

* fix config
2024-02-16 22:13:55 -08:00
vishal-14069
21e19b5b5a
Add Repetitive penalty to LLM inference - mlx-lm (#399)
* feat: add repetition penalty

* fix: generate function argument fix

* typo fixes

* update repetitive penalty

* update generate_step and generate

* resolve conflicts in generate

* merge latest oull origin master

* update generate

* update generate and generate_step

* update repetition list - rename variable

* refactor token count

* update generate step and generate

* move repetition_context in generate_step

* update generate step

* update generate_step
2024-02-16 21:58:17 -08:00
Madroid Ma
0ba466369f
LoRA: add training callbacks (#414)
* LoRA: add training callbacks

* LoRA: add trained tokens print & callback
2024-02-16 06:04:57 -08:00
Madroid Ma
726b1ddec0
fix: check LoRA layers number error (#446) 2024-02-16 06:03:33 -08:00
Angelos Katharopoulos
f71e965d57
Change gqa to use repeat instead of concatenate (#443) 2024-02-14 17:40:11 -08:00
Awni Hannun
06ddb8414d
Fix Qwen2 and SD (#441)
* fix qwen2

* version bump

* fix list shape
2024-02-14 13:43:12 -08:00
Chime Ogbuji
e446598f62
Passing parameterized loss and batching to trainer (#391) 2024-02-13 07:03:25 -08:00
Madroid Ma
954aa50c54
LoRA: Improve validation error for LoRA layer count exceeding model layer (#427)
* LoRA: Improve validation error for LoRA layer count exceeding model layer

This commit enhances the error handling when the specified LoRA layer count exceeds the total number of layers in the model. It clarifies the error message to provide actionable feedback for users, guiding them to adjust their input parameters accordingly.

* format + nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-02-13 06:56:27 -08:00
Awni Hannun
d4666615bb
Lazy import + refactor Lora layer addition (#426)
* lazy model import in mlx_lm

* change lora loading

* fix olmo lora

* remove a bunch of unused stuff from plamo

* move phixtral to mlx-lm and out of llms/
2024-02-12 10:51:02 -08:00
Ivan Fioravanti
4576946151
Add checkpoints directory for adapter weights (#431)
* Add checkpoints directory for adapter weights

The code was modified to create a checkpoints directory if it doesn't exist yet. Adapter weights are now saved to this checkpoints directory during the training iterations.
Corrected indentation of Save adapter weights code because it was part of "if eval"

* Fixing a blank added by mistake
2024-02-12 10:50:05 -08:00
Nripesh Niketan
f1ef378a58
Feat: update pre-commit rev (#432) 2024-02-11 07:23:27 -08:00
Awni Hannun
f45a1ab83c
Update a few examples to use compile (#420)
* update a few examples to use compile

* update mnist

* add compile to vae and rename some stuff for simplicity

* update reqs

* use state in eval

* GCN example with RNG + dropout

* add a bit of prefetching
2024-02-08 13:00:41 -08:00
Anchen
da7adae5ec
fix(mlx-m): lazy load hf_olmo (#424) 2024-02-08 09:02:43 -08:00
Markus Enzweiler
9b387007ab
Example of a Convolutional Variational Autoencoder (CVAE) on MNIST (#264)
* initial commit

* style fixes

* update of ACKNOWLEDGMENTS

* fixed comment

* minor refactoring; removed unused imports

* added cifar and cvae to top-level README.md

* removed mention of cuda/mps in argparse

* fixed training status output

* load_weights() with strict=True

* pretrained model update

* fixed imports and style

* requires mlx>=0.0.9

* updated with results using mlx 0.0.9

* removed mention of private repo

* simplify and combine to one file, more consistency with other exmaples

* few more nits

* nits

* spell

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-02-06 20:02:27 -08:00
Long Sha
8071aacd98
fix-mistral-download-link (#418) 2024-02-06 19:56:56 -08:00
Chris McMaster
2303238e44
Update olmo.py (#419)
exit should be imported outside of interactive mode
2024-02-06 16:16:46 -08:00
Anchen
8b77677c05
chore(mlx-lm): add model weight index in save_weights (#413)
* chore(mlx-lm): add model weight index in save_weights

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* chore: save total siZe as param size isntead of file size

* chore: clean up format

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-02-06 05:32:15 -08:00
Anchen
a7d139f484
fix(mlx-lm): olmo 1b model (#417) 2024-02-06 05:27:05 -08:00
Awni Hannun
aa7447efa2
Olmo in MLX LM (#415)
* run olmo

* format
2024-02-05 21:13:49 -08:00
Ivan Fioravanti
7fbca214b1
Add max sequence length argument in lora.py (#408)
A new argument "--max_seq_length" has been added to the command-line parser and passed as a parameter to the main function of the lora.py script. This allows users to specify and control the maximum sequence length during training.
2024-02-04 12:28:21 -08:00
Junyang Lin
9d0dd34403
add qwen2 (#411) 2024-02-04 08:31:38 -08:00
Madroid Ma
ba3a9355d1
LoRA: Remove unnecessary model type judgments (#388)
* LoRA: Remove unnecessary model type judgments

1. Supported models are already checked in the load_model function in utils, no need to repeat the check in lora
2. The checks in lora are not synchronized with those in utils

* LoRA: add LoRA supported models in mlx_lm utils
2024-01-31 11:55:27 -08:00
Anchen
0a49ba0697
fix(mlx-lm): apply lora layer doesn't update the lora weights (#396) 2024-01-31 11:51:26 -08:00
Sugato Ray
ab8bde1590
Add py.typed to support PEP-561 (type-hinting) (#389)
This adds support for type-hinting information as laid in [PEP-561](https://peps.python.org/pep-0561/).
2024-01-30 21:17:38 -08:00
David Koski
f8fadf7a17
Fix token count computation to fix tps measurements (#392) 2024-01-30 11:24:16 -08:00
Anchen
614de6652f
chore(mlx-lm): add reset lora layers helper (#377)
* chore(mlx-lm): add reset lora layers helper

* chore: rename the func

* chore: update docstring

* Update llms/mlx_lm/tuner/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-01-29 20:54:49 -08:00
Ashish
20b969b412
Replace time.time() with time.perf_counter() as it is more suited for benchmarking (#380) 2024-01-26 14:11:38 -08:00
Awni Hannun
5aa652d3c2
remove simplify (#379) 2024-01-26 13:54:49 -08:00
Ashish
0b57f0eae6
Add StableLM-2 1.6B (#378)
* init

* stablelm

* add to readme

* bump version

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-26 10:28:00 -08:00
Anchen
854ad8747a
feat(mlx-lm): add de-quant for fuse.py (#365)
* feat(mlx-lm): add de-quant for fuse

* chore: disable quant in to linear when de-quant enabled

* chore: add better error handling for adapter file not found
2024-01-25 18:59:32 -08:00
Anchen
f51e98fcf1
chore(mlx-lm): truncate the input sentence to max seq len in lora iterate_batches (#373)
* chore(mlx-lm): pass max seq len to evaluate in training loop

* chore: make sure the batch seq not exceed max len

* chore: update comment

* chore: add warning before truncate input
2024-01-25 12:38:04 -08:00
Anchen
b1dec281b3
feat(mlx-lm): add lora hypeparameters in lora layer (#366)
* feat(mlx-lm): add lora hypeparameters in lora layer

* chore: address comments
2024-01-24 08:11:25 -08:00
Anchen
5fc8668a53
fix(mlx-lm): handle legacy quant models (#369) 2024-01-24 07:44:05 -08:00
Anchen
ab91ac1075
chore(mlx-lm): add load model with adapter and fix bug in sample (#360)
* chore: add load model with adapter support and fix bug in sample

* chore: ignore temp during calculating prob in sample
2024-01-23 19:47:39 -08:00
Juarez Bochi
f5b80c95fb
Example reading directly from gguf file (#222)
* Draft of tiny llama from gguf

* Transpose all

* No transposition with new layout

* Read config from gguf

* Create tokenizer from gguf

* move gguf and update to be similar to hf_llm

* change model to HF style + updates to REAMDE

* nits in REAMDE

* nit readme

* only use mlx for metadata

* fix eos/bos tokenizer

* fix tokenization

* quantization runs

* 8-bit works

* tokenizer fix

* bump mlx version

---------

Co-authored-by: Juarez Bochi <juarez.bochi@grammarly.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 15:41:54 -08:00
iLoveBug
40b61c1719
fix the chinese character generation as same as PR #321 (#342)
* fix the chinese character generation as same as PR #321

* reuse the generate logic to utils.py

* format

* verbose defualt

* fix conflicst with colorize and character check

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 12:44:23 -08:00
Awni Hannun
21aa8038fb
MLX LM version bump (#358)
* version bump

* include new package
2024-01-23 09:05:57 -08:00
Anchen
362e88a744
feat: move lora into mlx-lm (#337)
* feat: Add lora and qlora training to mlx-lm


---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 08:44:37 -08:00
Shunta Saito
85c1ff8fd6
Add PLaMo-13B model as an LLM example (#303)
* Convert HF weights of PLaMo and load it to a plamo model in mlx

* Fix model inference part

* Add bos at the beginning of the prompt

* Fix convert.py to copy tokenizer.model into the converted dir

* Use the required insturction format in generate.py when "--instruct" option is specified

* Change filenames and update existing scripts

* Add README

* Add requirements.txt

* Fix plamo.py to stop generation when EOS appears

* Add quantization to convert.py

* Use mlx>=0.0.9 for mx.core.outer() in PLaMo model

* Update acknowledgements.md

* Fix card text in upload_to_hub()

* Not use prompt template when --instruct is not specified

* Ask if you trust_remote_code for loading tokenizer of PLaMo

* Check the user trusts the remote code when converting

* Remove plamo directory

* Update README

* Add PLaMo model file

* Fix the handling of cache in PLaMo and update README

* Ask if trust_remote_code only when the model is PLaMo

* Remove resolve_trust_remote_code from convert.py and use the latest transformers

* Remove code not to add EOS

* Update README to fix an example not to use noncommercial version of the model

* Remove unused imports

* Remove unnecessary description about the instruct model of PLaMo from README

* format, nits in README

* typo

---------

Co-authored-by: Shunta Saito <shunta@mitmul-mbp.local>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 07:17:24 -08:00
Ivan Fioravanti
c45c2311bd
Add colorized output option to generate script (#347)
* Add colorized output option to generate script

Two new functions were added to the script that allow output to be colorized based on the T[0] probability. Changes were made to the `generate_step` function in utils.py to permit colorization. Additionally, an argument for colorization was introduced to the command-line parser.

* Rename 'colorize' parameter with 'return_probability' in generate_step
2024-01-23 05:25:44 -08:00
Sugato Ray
a445ac2895
Update docs with conda install option (#354) 2024-01-22 21:14:48 -08:00
Baptiste Canton
42672f5446
add an option to apply the tokenizer chat template (#338)
* add an option to apply the tokenizer chat template

* fix the option to apply the tokenizer chat template

* better error messages for chat template issues

* apply the chat template by default when possible

* nit in comment'

* rebase

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-22 19:52:42 -08:00
Anchen
30be4c4734
refactor(qwen): moving qwen into mlx-lm (#312)
* refactor(qwen): moving qwen into mlx-lm

* chore: update doc

* chore: fix type hint

* add qwen model support in convert

* chore: fix doc

* chore: only load model in quantize_model

* chore: make the convert script only copy tokenizer files instead of load it and save

* chore: update docstring

* chore: remove unnecessary try catch

* chore: clean up for tokenizer and update  transformers 4.37

* nits in README

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-22 15:00:07 -08:00
Anchen
527cea4027
chore: fix the convert.py script for weights are not sanitized and support quant for non-32 dimensions (#340)
* chore: fix convert script for weights not sanitized and suport quant for non 32 dim

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* chore: fix typo

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-01-19 21:07:21 -08:00
bojanbabic
61297f547b
Missing requirements needed for convert script (#320)
* fix requirements and add eos parameter

* fix black

* address comment

* address comments - remove new arg
2024-01-18 19:04:24 -08:00
Awni Hannun
bcc9fc3581
two minor fixes (#335) 2024-01-18 14:18:13 -08:00
someone
2287294723
fix mlx_lm generator for chinese (#321)
* fix generator for chinese

* add REPLACEMENT_CHAR

---------

Co-authored-by: cg <cg@qq.com>
2024-01-16 07:13:33 -08:00
Awni Hannun
b0870ed679
fix response + bump version (#319) 2024-01-15 11:51:21 -08:00
Anchen
195bec2fa3
feat(mlx_lm): add mixtral support in mlx_lm (#318)
* feat: add mixtral support in mlx_lm

* chore: update doc
2024-01-15 07:18:14 -08:00
Marcel Bischoff
cd3cff0858
Phixtral (#290)
* initial

* file

* remove debug

* Adding README

* typo

* simplify readme

* nits in readmes

---------

Co-authored-by: Marcel Bischoff <marcel.bischoff@awarehq.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-13 08:35:03 -08:00
Anchen
a39b735c3b
chore(mlx-lm): update phi2 model args to sync with hf config format. (#311)
* chore(mlx-lm): update phi2 model args to sync with hf config format

* chore: fix type hint
2024-01-13 07:51:45 -08:00
Pedro Cuenca
ef93979973
Update model card uploaded with converted models (#309) 2024-01-12 13:03:52 -08:00
Angelos Katharopoulos
1fa40067fe
Change tuple type definitions to use Tuple (#308) 2024-01-12 11:15:09 -08:00
Awni Hannun
c6440416a2
Mlx llm package (#301)
* fix converter

* add recursive files

* remove gitignore

* remove gitignore

* add packages properly

* read me update

* remove dup readme

* relative

* fix convert

* fix community name

* fix url

* version
2024-01-12 10:25:56 -08:00
Anchen
6217d7acd0
Delete llms/hf_llm/models/.gitignore (#300) 2024-01-11 16:56:50 -08:00
Anchen
a2402116ae
refactor(hf_llm): moving phi2 example into hf_llm (#293)
* refactor: moving phi2 example into hf_llm

* chore: clean up

* chore: update phi2 model args so it can load args from config

* fix phi2 + nits + readme

* allow any HF repo, update README

* fix bug in llama

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-11 12:29:12 -08:00
Anchen
7380ebfb0d
fix: undefined hf_path (#292) 2024-01-11 05:53:52 -08:00
Konstantin Kerekovski
047d4650c4
Add -local flag to llms/hf_llm/convert.py for reading source HF models from filesystem. (#260)
* * Add --local flag for reading models from filesystem and related code for doing so
* Disable uploading to huggingface if --local flag is set

* Remove code related to .bin files and merge fetch_from_local and fetch_from_hub into one function.

* Update llms/hf_llm/convert.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* format / nits

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-10 19:53:01 -08:00