Prince Canuma
c012eb173f
Add support for OpenELM ( #719 )
...
* add openELM
* update splitting logic
* update qkv logic and, transformer and MLP block
* code formatting and fix args
* fix array slicing and remove unused var :)
* add to tuner
* use mx.split for slicing qkv
* merge with phi3
* remove rope scaling logic
* code formatting
2024-04-25 16:49:28 -07:00
Gökdeniz Gülmez
2c1c9e9024
MiniCPM implementation ( #685 )
...
* Added support for the MiniCPM architecture
* Added support for the MiniCPM architecture
* Updated utils.py and LORA.md
* Updated utils.py and LORA.md
* Update implementation details for MiniCPM architecture
* Cleaning up
* fixed the missing lm.head layer problem
* Refactor Model class to dynamically handle tied and untied word embeddings
* Quick update
* added a dynamic rope scaling base calucaltion
* Added support for the MiniCPM architecture
* Added support for the MiniCPM architecture
* Updated utils.py and LORA.md
* Updated utils.py and LORA.md
* Update implementation details for MiniCPM architecture
* Cleaning up
* fixed the missing lm.head layer problem
* Refactor Model class to dynamically handle tied and untied word embeddings
* added a dynamic rope scaling base calucaltion
* quick fix and clean up
* clean up again
* removed the MiniCPMNorm class as its not used
* forgot something, sorry
* format
* version bump
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-25 15:29:28 -07:00
Awni Hannun
685012c2ad
Couple fixes for LoRA ( #711 )
...
* don't overwrite in test only mode
* only load model specific safetensors
2024-04-25 14:16:13 -07:00
Kristian Muñiz
109ee2f2f8
Use CORS headers for streaming for MLX Server ( #716 )
2024-04-25 07:26:04 -07:00
Kevin Wang
8a265f0d54
Fix incorrect type annotation ( #720 )
...
A `Tuple` is missing in this type annotation.
2024-04-24 15:52:43 -07:00
Prince Canuma
abcd891851
Add support for phi-3 ( #712 )
...
* Add phi-3 modelling
* fix rope scaling warning
* add tests and update tuner utils
* update name and remove sanitize
* fix lora
2024-04-23 09:20:00 -07:00
Aaron Ng
8d5cf5b0c8
use logging in mlx server ( #705 )
2024-04-22 07:50:06 -07:00
Anchen
749cabf299
fix: unicode decoding ( #702 )
2024-04-21 08:58:23 -07:00
Karim Elmaaroufi
1484598de1
Add support for logit bias ( #697 )
2024-04-21 06:53:56 -07:00
Awni Hannun
6abdbe3be8
Fix quant in gguf ( #698 )
...
* fix quant in gguf
* fix whisper
2024-04-19 20:07:11 -07:00
Awni Hannun
574ad7f6fe
fix dequantization ( #693 )
2024-04-19 10:46:59 -07:00
Awni Hannun
2146bcd7ee
Quantize embedding / Update quantize API ( #680 )
...
* more async eval
* quantize embedding / update quantize api
* more updates for quantize
* update for quantize embeddings
* update sd quant API
* update sdxl quants
* error for datasets < batch_size
* async
* fix config loading
* fix quant
* fix tests
* fix req
* remove lm head if tie weights is true
* fix test
2024-04-18 18:16:10 -07:00
Anchen
f5f189e48a
fix(mlx-lm): broken server.py ( #690 )
...
* fix server.py
* fix var referenced before assignment
* add test
* clean up
2024-04-18 14:26:18 -07:00
Phúc H. Lê Khắc
35206806ac
Create executables for generate, lora, server, merge, convert ( #682 )
...
* feat: create executables mlx_lm.<cmd>
* nits in docs
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-16 16:08:49 -07:00
dmdaksh
7d7e236061
- Removed unused Python imports ( #683 )
...
- bert/model.py:10: tree_unflatten
- bert/model.py:2: dataclass
- bert/model.py:8: numpy
- cifar/resnet.py:6: Any
- clip/model.py:15: tree_flatten
- clip/model.py:9: Union
- gcn/main.py:8: download_cora
- gcn/main.py:9: cross_entropy
- llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
- llms/gguf_llm/models.py:9: numpy
- llms/mixtral/mixtral.py:12: tree_map
- llms/mlx_lm/models/dbrx.py:2: Dict, Union
- llms/mlx_lm/tuner/trainer.py:5: partial
- llms/speculative_decoding/decoder.py:1: dataclass, field
- llms/speculative_decoding/decoder.py:2: Optional
- llms/speculative_decoding/decoder.py:5: mlx.nn
- llms/speculative_decoding/decoder.py:6: numpy
- llms/speculative_decoding/main.py:2: glob
- llms/speculative_decoding/main.py:3: json
- llms/speculative_decoding/main.py:5: Path
- llms/speculative_decoding/main.py:8: mlx.nn
- llms/speculative_decoding/model.py:6: tree_unflatten
- llms/speculative_decoding/model.py:7: AutoTokenizer
- llms/tests/test_lora.py:13: yaml_loader
- lora/lora.py:14: tree_unflatten
- lora/models.py:11: numpy
- lora/models.py:3: glob
- speechcommands/kwt.py:1: Any
- speechcommands/main.py:7: mlx.data
- stable_diffusion/stable_diffusion/model_io.py:4: partial
- whisper/benchmark.py:5: sys
- whisper/test.py:5: subprocess
- whisper/whisper/audio.py:6: Optional
- whisper/whisper/decoding.py:8: mlx.nn
2024-04-16 07:50:32 -07:00
Angelos Katharopoulos
e55a9e8cb4
Add an SPM detokenizer that doesn't trim initial space ( #681 )
2024-04-15 14:15:25 -07:00
Awni Hannun
d3f8e4aee9
Fix argpartition call in Mixtral and other MOES ( #676 )
...
* Update mixtral.py
* fix all moes
---------
Co-authored-by: yuhai-china <yuhai.china@gmail.com>
2024-04-12 11:00:56 -07:00
Awni Hannun
9c5554d8ee
Use async eval ( #670 )
...
* Use async eval
* bump
* bump
* remove workaround for bfloat cumsum
2024-04-11 13:18:23 -07:00
devonthomas35
9f472dc985
Update transformers for ⌘-R+ ( #668 )
2024-04-11 07:28:12 -07:00
da-z
5a4cad34ef
Always resume downloads ( #674 )
...
* Always resume downloads
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-11 06:52:32 -07:00
Angelos Katharopoulos
1278994b56
Add streaming detokenizers ( #651 )
2024-04-08 22:36:01 -07:00
Awni Hannun
c68aa3c7c3
Stable lm 2 ( #666 )
...
* stable lm 2
* test and lora
* version bump
* merge stable models
2024-04-08 14:18:55 -07:00
Awni Hannun
1e2f7f50b6
fix for empty initial string ( #665 )
2024-04-08 10:40:05 -07:00
Awni Hannun
c386dd5f5a
Fix for cohere plus ( #650 )
...
* fix for cohere plus
* version bump
2024-04-05 14:11:24 -07:00
Awni Hannun
2bd64b78cf
Save lora config ( #636 )
...
* lora config
* comments
* version bump
2024-04-02 13:52:53 -07:00
Prince Canuma
d661440dbb
Add support for qwen2moe ( #640 )
...
* add sparsemoe block and update decoder logic
* update file name to match HF
* update name
* Code formatting
* update gates calculation
* add support for Qwen2MoE.
* fix pytest
* code formatting and fix missing comma in utils
* Remove decoder sparse step.
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
* remove gate layer anti-quantisation
* remove unused argument
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
2024-04-02 11:33:29 -07:00
Awni Hannun
78c431dc25
cleanup whisper a little ( #639 )
2024-03-30 13:13:58 -07:00
Chime Ogbuji
f6283ef7ce
Configurable LR schedulers ( #604 )
...
* Initial config handler and test
* Added means to run from CLI
* Update lora config loading and tests
* Constrain scheduler config (warmup and minimum LR) for each kind
* Update reference to moved schedule_config module
* Minor fix
* Fix typos
* Moved build_schedule and tests
* nits in schedule config
* flake
* fix path
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-29 13:41:10 -07:00
Awni Hannun
b80adbcc3e
DBRX ( #628 )
...
* dbrx
* format
* format
* comments
* change scores slightly
* remove inadvertant import
2024-03-28 21:03:53 -07:00
Anchen
297a908e3d
fix(mlx-lm): type hints in gguf.py ( #621 )
2024-03-26 07:56:01 -07:00
Anchen
0ab01b4626
fix(mlx-lm): sorted probs in top_p implementation. ( #610 )
...
* fix(mlx-lm): the top p imp
* chore: address comment
2024-03-25 15:07:55 -07:00
Awni Hannun
bbfcc103d7
cast around lora adapters ( #613 )
2024-03-24 19:34:51 -07:00
Awni Hannun
5a52899405
Partially stream de-tokenization ( #609 )
...
* partially stream de-tokenization
* don't break full response
2024-03-23 15:32:33 -07:00
Anchen
494cdf8e96
chore: fix loar for moe model ( #608 )
2024-03-23 07:22:11 -07:00
Awni Hannun
b8a348c1b8
Switch to fast RMS/LN Norm ( #603 )
...
* use nn.RMSNorm, use sdpa, cleanup
* bump mlx versions
* minor update
* use fast layer norm
* version bump
* update requirement for whisper
* update requirement for gguf
2024-03-23 07:13:51 -07:00
Anchen
fbed720d6f
chore(mlx-lm): fix the top_p implementation. ( #602 )
...
* chore(mlx-lm): clean up the top p imp
* chore: clean up
* chore: add test
* chore: address comments
* chore: clean up docs string
* chore: clean up test
2024-03-21 12:18:23 -07:00
Anchen
fe96ef342f
feat(mlx-lm): export the GGUF (fp16) format model weights from fuse.py ( #555 )
...
* wip
* wip
* feat: convert mlx model to gguf f16
* chore: conver norm layer to float32 to avoid overflow issue
* chore: add support for mixtral
* chore: clean up
* chore: remove unused import statement
* chore: clean up weight name mapping
* version and readme
* actual version bump
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-21 10:34:11 -07:00
Anchen
8f906c859a
chore(mlx-lm): enable to apply default chat template ( #577 )
...
* chore(mlx-lm): enable to apply default chat template
* Add option to use default chat template
* chore: rename the flag to use default chat template
2024-03-20 21:39:39 -07:00
Ivan Fioravanti
d2a99172a6
Add dropout parameter to lora configuration ( #599 )
...
* Add dropout parameter to lora configuration
A dropout parameter has been added to the lora configuration settings in lora_config.yaml. The LoRALinear class in utils.py has been updated to take this new parameter. Additionally, a AttributeError: 'types.SimpleNamespace' object has no attribute 'prompt' related to `args.prompt` has been removed from lora.py.
* Update lora_config.yaml
Set dropout to 0.0 in the sample config file
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-20 08:44:40 -07:00
Anchen
949f63f309
chore(mlx-lm): fix print_trainable_parameters for quant models ( #581 )
...
* chore(mlx-lm): fix print_trainable_parameters for quant models
* chore: clean up
* refactor: use layer type to check quant bits
* chore: address comment
2024-03-20 08:41:03 -07:00
Matt Wronkiewicz
373dd6f2a2
Set finish_reason in response ( #592 )
2024-03-19 20:21:26 -07:00
Alwin Arrasyid
6c3d4c8ba2
add dequantize option to mlx_lm/convert.py ( #547 )
2024-03-19 19:50:08 -07:00
Chime Ogbuji
6f2fd5daea
Add mlx-lm version information to HF model card ( #596 )
...
* Add mlx-lm version informatiohn to HF model card
* Update llms/mlx_lm/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Reverted indentation
* Pre-commit formatting
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-03-19 19:42:03 -07:00
madroid
39d5ca6427
LoRA: report last train info ( #595 )
2024-03-19 17:29:50 -07:00
madroid
b0bcd86a40
Support for OpenAI’s fine-tuning dataset format ( #548 )
...
* LoRA: move load_dataset to tuner/datasets.py file
* LoRA: support OpenAI chat format datasets
see https://platform.openai.com/docs/guides/fine-tuning/example-format
* LoRA: support OpenAI completion format datasets
* LoRA: formatting dataset timing to reduce memory footprint
* Refactor dataset item access in PromptCompletionDataset
* Update mlx_lm/LORA.md
* Update mlx_lm/LORA.md
* check Unsupported data format
* add tests, fine-tune doc
* add tests, fine-tune doc
* add jinja2 for chat template
* nits in readme
* nits in readme
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-19 16:45:46 -07:00
Awni Hannun
e4b19bb9e1
Make attention faster for a some models ( #574 )
...
* make attention faster for a couple models
* remove unused generation flags
* add comment on lora
* include text files as well
2024-03-14 21:35:54 -07:00
sweetcard
e2205beb66
Update server.py to add --trust-remote-code to server ( #578 )
...
* Update server.py
Add --trust-remote-code to server
* format code by running pre-commit
---------
Co-authored-by: flymonk <zhou.feng@gsafer.com>
2024-03-14 07:05:19 -07:00
Sugato Ray
2cd793dd69
feat: add update_config functionality ( #531 )
...
* feat: add `update_config` finctionality
- sorts the config for better readability
- updates "_name_or_path" key in config with upload_repo
- sets indentation of 4 spaces
- allows adding other key-value pairs via kwargs
- reduces code duplication
- standardizes config-update across mlx-lm
* feat: standardize updating config
Impactes:
- fuse.py
- merge.py
* update formatting
* remove commented out code
* update func: update_config to save_config
- drop kwards
- rename func as save_config
- incorporate review suggestions
* update func: save_config
- ensure only config-saving functionality
- function oes not return config as a dict anymore
- added review suggestions
* fixed formatting
* update formatting instruction in contribution guide
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-14 06:36:05 -07:00
madroid
485180ae91
LoRA: some minor optimizations ( #573 )
...
* init training_args in training scope
* Add trainable parameters percentage
2024-03-13 20:26:30 -07:00
madroid
d4e1de1d5b
add peak_memory info to training callback ( #572 )
2024-03-13 20:17:10 -07:00