Commit Graph

29 Commits

Author SHA1 Message Date
Y4hL
ea92f623d6
Prevent llms/mlx_lm from serving the local directory as a webserver (#498)
* Don't serve local directory

BaseHTTPRequestHandler serves the current directory by default. Definitely not intended behaviour. Remove the "do_HEAD" and "do_GET" methods.

* Fix typo in method name

I assume hanlde_stream was intended to be called handle_stream

* Fix outdated typehint

load_model returns nn.Module, however fetch_from_hub was not updated to reflect the change

* Add some more type hints

* Add warnings for using in prod

Add a warning to README and runtime, discouraging use in production. The warning is the same as on the python docs for HTTPServer https://docs.python.org/3/library/http.server.html

* format

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-02-27 19:40:42 -08:00
Awni Hannun
95f82e67a2
Fix import warning (#479)
* fix import warning
* fix version import
* remove api, move convert to utils
* also update circle to run external PRs
2024-02-27 08:47:56 -08:00
peterjc123
ccb278bcbd
Add top-p sampling for text generation (#486) 2024-02-26 06:18:11 -08:00
Angelos Katharopoulos
dc4f2e0a6b
Lazy loading models for faster convert and merge (#462) 2024-02-20 13:36:55 -08:00
vishal-14069
21e19b5b5a
Add Repetitive penalty to LLM inference - mlx-lm (#399)
* feat: add repetition penalty

* fix: generate function argument fix

* typo fixes

* update repetitive penalty

* update generate_step and generate

* resolve conflicts in generate

* merge latest oull origin master

* update generate

* update generate and generate_step

* update repetition list - rename variable

* refactor token count

* update generate step and generate

* move repetition_context in generate_step

* update generate step

* update generate_step
2024-02-16 21:58:17 -08:00
Awni Hannun
d4666615bb
Lazy import + refactor Lora layer addition (#426)
* lazy model import in mlx_lm

* change lora loading

* fix olmo lora

* remove a bunch of unused stuff from plamo

* move phixtral to mlx-lm and out of llms/
2024-02-12 10:51:02 -08:00
Anchen
8b77677c05
chore(mlx-lm): add model weight index in save_weights (#413)
* chore(mlx-lm): add model weight index in save_weights

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* chore: save total siZe as param size isntead of file size

* chore: clean up format

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-02-06 05:32:15 -08:00
Awni Hannun
aa7447efa2
Olmo in MLX LM (#415)
* run olmo

* format
2024-02-05 21:13:49 -08:00
Junyang Lin
9d0dd34403
add qwen2 (#411) 2024-02-04 08:31:38 -08:00
Madroid Ma
ba3a9355d1
LoRA: Remove unnecessary model type judgments (#388)
* LoRA: Remove unnecessary model type judgments

1. Supported models are already checked in the load_model function in utils, no need to repeat the check in lora
2. The checks in lora are not synchronized with those in utils

* LoRA: add LoRA supported models in mlx_lm utils
2024-01-31 11:55:27 -08:00
David Koski
f8fadf7a17
Fix token count computation to fix tps measurements (#392) 2024-01-30 11:24:16 -08:00
Ashish
20b969b412
Replace time.time() with time.perf_counter() as it is more suited for benchmarking (#380) 2024-01-26 14:11:38 -08:00
Awni Hannun
5aa652d3c2
remove simplify (#379) 2024-01-26 13:54:49 -08:00
Ashish
0b57f0eae6
Add StableLM-2 1.6B (#378)
* init

* stablelm

* add to readme

* bump version

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-26 10:28:00 -08:00
Anchen
854ad8747a
feat(mlx-lm): add de-quant for fuse.py (#365)
* feat(mlx-lm): add de-quant for fuse

* chore: disable quant in to linear when de-quant enabled

* chore: add better error handling for adapter file not found
2024-01-25 18:59:32 -08:00
Anchen
b1dec281b3
feat(mlx-lm): add lora hypeparameters in lora layer (#366)
* feat(mlx-lm): add lora hypeparameters in lora layer

* chore: address comments
2024-01-24 08:11:25 -08:00
Anchen
5fc8668a53
fix(mlx-lm): handle legacy quant models (#369) 2024-01-24 07:44:05 -08:00
Anchen
ab91ac1075
chore(mlx-lm): add load model with adapter and fix bug in sample (#360)
* chore: add load model with adapter support and fix bug in sample

* chore: ignore temp during calculating prob in sample
2024-01-23 19:47:39 -08:00
iLoveBug
40b61c1719
fix the chinese character generation as same as PR #321 (#342)
* fix the chinese character generation as same as PR #321

* reuse the generate logic to utils.py

* format

* verbose defualt

* fix conflicst with colorize and character check

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 12:44:23 -08:00
Anchen
362e88a744
feat: move lora into mlx-lm (#337)
* feat: Add lora and qlora training to mlx-lm


---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 08:44:37 -08:00
Shunta Saito
85c1ff8fd6
Add PLaMo-13B model as an LLM example (#303)
* Convert HF weights of PLaMo and load it to a plamo model in mlx

* Fix model inference part

* Add bos at the beginning of the prompt

* Fix convert.py to copy tokenizer.model into the converted dir

* Use the required insturction format in generate.py when "--instruct" option is specified

* Change filenames and update existing scripts

* Add README

* Add requirements.txt

* Fix plamo.py to stop generation when EOS appears

* Add quantization to convert.py

* Use mlx>=0.0.9 for mx.core.outer() in PLaMo model

* Update acknowledgements.md

* Fix card text in upload_to_hub()

* Not use prompt template when --instruct is not specified

* Ask if you trust_remote_code for loading tokenizer of PLaMo

* Check the user trusts the remote code when converting

* Remove plamo directory

* Update README

* Add PLaMo model file

* Fix the handling of cache in PLaMo and update README

* Ask if trust_remote_code only when the model is PLaMo

* Remove resolve_trust_remote_code from convert.py and use the latest transformers

* Remove code not to add EOS

* Update README to fix an example not to use noncommercial version of the model

* Remove unused imports

* Remove unnecessary description about the instruct model of PLaMo from README

* format, nits in README

* typo

---------

Co-authored-by: Shunta Saito <shunta@mitmul-mbp.local>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 07:17:24 -08:00
Ivan Fioravanti
c45c2311bd
Add colorized output option to generate script (#347)
* Add colorized output option to generate script

Two new functions were added to the script that allow output to be colorized based on the T[0] probability. Changes were made to the `generate_step` function in utils.py to permit colorization. Additionally, an argument for colorization was introduced to the command-line parser.

* Rename 'colorize' parameter with 'return_probability' in generate_step
2024-01-23 05:25:44 -08:00
Anchen
30be4c4734
refactor(qwen): moving qwen into mlx-lm (#312)
* refactor(qwen): moving qwen into mlx-lm

* chore: update doc

* chore: fix type hint

* add qwen model support in convert

* chore: fix doc

* chore: only load model in quantize_model

* chore: make the convert script only copy tokenizer files instead of load it and save

* chore: update docstring

* chore: remove unnecessary try catch

* chore: clean up for tokenizer and update  transformers 4.37

* nits in README

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-22 15:00:07 -08:00
Anchen
527cea4027
chore: fix the convert.py script for weights are not sanitized and support quant for non-32 dimensions (#340)
* chore: fix convert script for weights not sanitized and suport quant for non 32 dim

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* chore: fix typo

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-01-19 21:07:21 -08:00
Awni Hannun
bcc9fc3581
two minor fixes (#335) 2024-01-18 14:18:13 -08:00
someone
2287294723
fix mlx_lm generator for chinese (#321)
* fix generator for chinese

* add REPLACEMENT_CHAR

---------

Co-authored-by: cg <cg@qq.com>
2024-01-16 07:13:33 -08:00
Awni Hannun
b0870ed679
fix response + bump version (#319) 2024-01-15 11:51:21 -08:00
Anchen
195bec2fa3
feat(mlx_lm): add mixtral support in mlx_lm (#318)
* feat: add mixtral support in mlx_lm

* chore: update doc
2024-01-15 07:18:14 -08:00
Awni Hannun
c6440416a2
Mlx llm package (#301)
* fix converter

* add recursive files

* remove gitignore

* remove gitignore

* add packages properly

* read me update

* remove dup readme

* relative

* fix convert

* fix community name

* fix url

* version
2024-01-12 10:25:56 -08:00