Commit Graph

409 Commits

Author SHA1 Message Date
Awni Hannun
ecbc6ff1e3
one more quant fix (#708) 2024-04-22 18:12:52 -07:00
Aaron Ng
8d5cf5b0c8
use logging in mlx server (#705) 2024-04-22 07:50:06 -07:00
AlexandrosChrtn
f20e68fcc0
Load fused model with transformers (#703)
* save format for transformers compatibility

* save format for transformers compatibility arg

* hardcode mlx

* hardcode mlx format
2024-04-21 09:04:44 -07:00
Anchen
749cabf299
fix: unicode decoding (#702) 2024-04-21 08:58:23 -07:00
Karim Elmaaroufi
1484598de1
Add support for logit bias (#697) 2024-04-21 06:53:56 -07:00
Awni Hannun
6abdbe3be8
Fix quant in gguf (#698)
* fix quant in gguf

* fix whisper
2024-04-19 20:07:11 -07:00
Awni Hannun
574ad7f6fe
fix dequantization (#693) 2024-04-19 10:46:59 -07:00
Awni Hannun
2146bcd7ee
Quantize embedding / Update quantize API (#680)
* more async eval

* quantize embedding / update quantize api

* more updates for quantize

* update for quantize embeddings

* update sd quant API

* update sdxl quants

* error for datasets < batch_size

* async

* fix config loading

* fix quant

* fix tests

* fix req

* remove lm head if tie weights is true

* fix test
2024-04-18 18:16:10 -07:00
Anchen
f5f189e48a
fix(mlx-lm): broken server.py (#690)
* fix server.py

* fix var referenced before assignment

* add test

* clean up
2024-04-18 14:26:18 -07:00
Phúc H. Lê Khắc
35206806ac
Create executables for generate, lora, server, merge, convert (#682)
* feat: create executables mlx_lm.<cmd>

* nits in docs

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-16 16:08:49 -07:00
dmdaksh
7d7e236061
- Removed unused Python imports (#683)
- bert/model.py:10: tree_unflatten
  - bert/model.py:2: dataclass
  - bert/model.py:8: numpy
  - cifar/resnet.py:6: Any
  - clip/model.py:15: tree_flatten
  - clip/model.py:9: Union
  - gcn/main.py:8: download_cora
  - gcn/main.py:9: cross_entropy
  - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
  - llms/gguf_llm/models.py:9: numpy
  - llms/mixtral/mixtral.py:12: tree_map
  - llms/mlx_lm/models/dbrx.py:2: Dict, Union
  - llms/mlx_lm/tuner/trainer.py:5: partial
  - llms/speculative_decoding/decoder.py:1: dataclass, field
  - llms/speculative_decoding/decoder.py:2: Optional
  - llms/speculative_decoding/decoder.py:5: mlx.nn
  - llms/speculative_decoding/decoder.py:6: numpy
  - llms/speculative_decoding/main.py:2: glob
  - llms/speculative_decoding/main.py:3: json
  - llms/speculative_decoding/main.py:5: Path
  - llms/speculative_decoding/main.py:8: mlx.nn
  - llms/speculative_decoding/model.py:6: tree_unflatten
  - llms/speculative_decoding/model.py:7: AutoTokenizer
  - llms/tests/test_lora.py:13: yaml_loader
  - lora/lora.py:14: tree_unflatten
  - lora/models.py:11: numpy
  - lora/models.py:3: glob
  - speechcommands/kwt.py:1: Any
  - speechcommands/main.py:7: mlx.data
  - stable_diffusion/stable_diffusion/model_io.py:4: partial
  - whisper/benchmark.py:5: sys
  - whisper/test.py:5: subprocess
  - whisper/whisper/audio.py:6: Optional
  - whisper/whisper/decoding.py:8: mlx.nn
2024-04-16 07:50:32 -07:00
Angelos Katharopoulos
e55a9e8cb4
Add an SPM detokenizer that doesn't trim initial space (#681) 2024-04-15 14:15:25 -07:00
Awni Hannun
d3f8e4aee9
Fix argpartition call in Mixtral and other MOES (#676)
* Update mixtral.py

* fix all moes

---------

Co-authored-by: yuhai-china <yuhai.china@gmail.com>
2024-04-12 11:00:56 -07:00
Awni Hannun
9c5554d8ee
Use async eval (#670)
* Use async eval

* bump

* bump

* remove workaround for bfloat cumsum
2024-04-11 13:18:23 -07:00
Nripesh Niketan
0250f6f38e
feat: Update black-pre-commit-mirror to version 24.3.0 (#675) 2024-04-11 07:28:26 -07:00
devonthomas35
9f472dc985
Update transformers for ⌘-R+ (#668) 2024-04-11 07:28:12 -07:00
da-z
5a4cad34ef
Always resume downloads (#674)
* Always resume downloads

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-11 06:52:32 -07:00
Angelos Katharopoulos
eff6690952
Fix CFG for SDXL (#667) 2024-04-09 06:06:41 -07:00
Angelos Katharopoulos
1278994b56
Add streaming detokenizers (#651) 2024-04-08 22:36:01 -07:00
Awni Hannun
c68aa3c7c3
Stable lm 2 (#666)
* stable lm 2

* test and lora

* version bump

* merge stable models
2024-04-08 14:18:55 -07:00
Awni Hannun
1e2f7f50b6
fix for empty initial string (#665) 2024-04-08 10:40:05 -07:00
Awni Hannun
c386dd5f5a
Fix for cohere plus (#650)
* fix for cohere plus

* version bump
2024-04-05 14:11:24 -07:00
Awni Hannun
2bd64b78cf
Save lora config (#636)
* lora config

* comments

* version bump
2024-04-02 13:52:53 -07:00
Prince Canuma
d661440dbb
Add support for qwen2moe (#640)
* add sparsemoe block and update decoder logic

* update file name to match HF

* update name

* Code formatting

* update gates calculation

* add support for Qwen2MoE.

* fix pytest

* code formatting and fix missing comma in utils

* Remove decoder sparse step.

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>

* remove gate layer anti-quantisation

* remove unused argument

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
2024-04-02 11:33:29 -07:00
Awni Hannun
78c431dc25
cleanup whisper a little (#639) 2024-03-30 13:13:58 -07:00
Chime Ogbuji
f6283ef7ce
Configurable LR schedulers (#604)
* Initial config handler and test

* Added means to run from CLI

* Update lora config loading and tests

* Constrain scheduler config (warmup and minimum LR) for each kind

* Update reference to moved schedule_config module

* Minor fix

* Fix typos

* Moved build_schedule and tests

* nits in schedule config

* flake

* fix path

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-29 13:41:10 -07:00
Awni Hannun
b80adbcc3e
DBRX (#628)
* dbrx

* format

* format

* comments

* change scores slightly

* remove inadvertant import
2024-03-28 21:03:53 -07:00
Anchen
297a908e3d
fix(mlx-lm): type hints in gguf.py (#621) 2024-03-26 07:56:01 -07:00
Anchen
0ab01b4626
fix(mlx-lm): sorted probs in top_p implementation. (#610)
* fix(mlx-lm): the top p imp

* chore: address comment
2024-03-25 15:07:55 -07:00
Awni Hannun
bbfcc103d7
cast around lora adapters (#613) 2024-03-24 19:34:51 -07:00
Awni Hannun
5a52899405
Partially stream de-tokenization (#609)
* partially stream de-tokenization

* don't break full response
2024-03-23 15:32:33 -07:00
Anchen
494cdf8e96
chore: fix loar for moe model (#608) 2024-03-23 07:22:11 -07:00
Awni Hannun
b8a348c1b8
Switch to fast RMS/LN Norm (#603)
* use nn.RMSNorm, use sdpa, cleanup

* bump mlx versions

* minor update

* use fast layer norm

* version bump

* update requirement for whisper

* update requirement for gguf
2024-03-23 07:13:51 -07:00
Anchen
fbed720d6f
chore(mlx-lm): fix the top_p implementation. (#602)
* chore(mlx-lm): clean up the top p imp

* chore: clean up

* chore: add test

* chore: address comments

* chore: clean up docs string

* chore: clean up test
2024-03-21 12:18:23 -07:00
Anchen
fe96ef342f
feat(mlx-lm): export the GGUF (fp16) format model weights from fuse.py (#555)
* wip

* wip

* feat: convert mlx model to gguf f16

* chore: conver norm layer to float32 to avoid overflow issue

* chore: add support for mixtral

* chore: clean up

* chore: remove unused import statement

* chore: clean up weight name mapping

* version and readme

* actual version bump

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-21 10:34:11 -07:00
Anchen
8f906c859a
chore(mlx-lm): enable to apply default chat template (#577)
* chore(mlx-lm): enable to apply default chat template

* Add option to use default chat template

* chore: rename the flag to use default chat template
2024-03-20 21:39:39 -07:00
Ivan Fioravanti
d2a99172a6
Add dropout parameter to lora configuration (#599)
* Add dropout parameter to lora configuration

A dropout parameter has been added to the lora configuration settings in lora_config.yaml. The LoRALinear class in utils.py has been updated to take this new parameter. Additionally, a AttributeError: 'types.SimpleNamespace' object has no attribute 'prompt' related to `args.prompt` has been removed from lora.py.

* Update lora_config.yaml

Set dropout to 0.0 in the sample config file

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-20 08:44:40 -07:00
Anchen
949f63f309
chore(mlx-lm): fix print_trainable_parameters for quant models (#581)
* chore(mlx-lm): fix print_trainable_parameters for quant models

* chore: clean up

* refactor: use layer type to check quant bits

* chore: address comment
2024-03-20 08:41:03 -07:00
Matt Wronkiewicz
373dd6f2a2
Set finish_reason in response (#592) 2024-03-19 20:21:26 -07:00
Alwin Arrasyid
6c3d4c8ba2
add dequantize option to mlx_lm/convert.py (#547) 2024-03-19 19:50:08 -07:00
Chime Ogbuji
6f2fd5daea
Add mlx-lm version information to HF model card (#596)
* Add mlx-lm version informatiohn to HF model card

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* Reverted indentation

* Pre-commit formatting

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-03-19 19:42:03 -07:00
madroid
39d5ca6427
LoRA: report last train info (#595) 2024-03-19 17:29:50 -07:00
yzimmermann
4680ef4413
Enable more BERT models (#580)
* Update convert.py

* Update model.py

* Update test.py

* Update model.py

* Update convert.py

* Add files via upload

* Update convert.py

* format

* nit

* nit

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-19 17:21:33 -07:00
madroid
b0bcd86a40
Support for OpenAI’s fine-tuning dataset format (#548)
* LoRA: move load_dataset to tuner/datasets.py file

* LoRA: support OpenAI chat format datasets

see https://platform.openai.com/docs/guides/fine-tuning/example-format

* LoRA: support OpenAI completion format datasets

* LoRA: formatting dataset timing to reduce memory footprint

* Refactor dataset item access in PromptCompletionDataset

* Update mlx_lm/LORA.md

* Update mlx_lm/LORA.md

* check Unsupported data format

* add tests, fine-tune doc

* add tests, fine-tune doc

* add jinja2 for chat template

* nits in readme

* nits in readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-19 16:45:46 -07:00
Abdul Fatir
e05e502c34
Fix scaling when embeddings are tied (#591) 2024-03-18 13:41:07 -07:00
Awni Hannun
e4b19bb9e1
Make attention faster for a some models (#574)
* make attention faster for a couple models

* remove unused generation flags

* add comment on lora

* include text files as well
2024-03-14 21:35:54 -07:00
Angelos Katharopoulos
3f3741d229
Fix requirements and image2image strength/steps mismatch (#585) 2024-03-14 12:22:54 -07:00
sweetcard
e2205beb66
Update server.py to add --trust-remote-code to server (#578)
* Update server.py

Add --trust-remote-code to server

* format code by running pre-commit

---------

Co-authored-by: flymonk <zhou.feng@gsafer.com>
2024-03-14 07:05:19 -07:00
Sugato Ray
2cd793dd69
feat: add update_config functionality (#531)
* feat: add `update_config` finctionality

- sorts the config for better readability
- updates "_name_or_path" key in config with upload_repo
- sets indentation of 4 spaces
- allows adding other key-value pairs via kwargs
- reduces code duplication
- standardizes config-update across mlx-lm

* feat: standardize updating config

Impactes:
- fuse.py
- merge.py

* update formatting

* remove commented out code

* update func: update_config to save_config

- drop kwards
- rename func as save_config
- incorporate review suggestions

* update func: save_config

- ensure only config-saving functionality
- function oes not return config as a dict anymore
- added review suggestions

* fixed formatting

* update formatting instruction in contribution guide

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-14 06:36:05 -07:00
madroid
485180ae91
LoRA: some minor optimizations (#573)
* init training_args in training scope

* Add trainable parameters percentage
2024-03-13 20:26:30 -07:00