mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-12-06 03:28:08 +08:00

Author	SHA1	Message	Date
Prince Canuma	d661440dbb	Add support for qwen2moe (#640 ) * add sparsemoe block and update decoder logic * update file name to match HF * update name * Code formatting * update gates calculation * add support for Qwen2MoE. * fix pytest * code formatting and fix missing comma in utils * Remove decoder sparse step. Co-authored-by: bozheng-hit <dsoul0621@gmail.com> * remove gate layer anti-quantisation * remove unused argument --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com>	2024-04-02 11:33:29 -07:00
Awni Hannun	78c431dc25	cleanup whisper a little (#639 )	2024-03-30 13:13:58 -07:00
Chime Ogbuji	f6283ef7ce	Configurable LR schedulers (#604 ) * Initial config handler and test * Added means to run from CLI * Update lora config loading and tests * Constrain scheduler config (warmup and minimum LR) for each kind * Update reference to moved schedule_config module * Minor fix * Fix typos * Moved build_schedule and tests * nits in schedule config * flake * fix path --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-29 13:41:10 -07:00
Awni Hannun	b80adbcc3e	DBRX (#628 ) * dbrx * format * format * comments * change scores slightly * remove inadvertant import	2024-03-28 21:03:53 -07:00
Anchen	297a908e3d	fix(mlx-lm): type hints in gguf.py (#621 )	2024-03-26 07:56:01 -07:00
Anchen	0ab01b4626	fix(mlx-lm): sorted probs in top_p implementation. (#610 ) * fix(mlx-lm): the top p imp * chore: address comment	2024-03-25 15:07:55 -07:00
Awni Hannun	bbfcc103d7	cast around lora adapters (#613 )	2024-03-24 19:34:51 -07:00
Awni Hannun	5a52899405	Partially stream de-tokenization (#609 ) * partially stream de-tokenization * don't break full response	2024-03-23 15:32:33 -07:00
Anchen	494cdf8e96	chore: fix loar for moe model (#608 )	2024-03-23 07:22:11 -07:00
Awni Hannun	b8a348c1b8	Switch to fast RMS/LN Norm (#603 ) * use nn.RMSNorm, use sdpa, cleanup * bump mlx versions * minor update * use fast layer norm * version bump * update requirement for whisper * update requirement for gguf	2024-03-23 07:13:51 -07:00
Anchen	fbed720d6f	chore(mlx-lm): fix the top_p implementation. (#602 ) * chore(mlx-lm): clean up the top p imp * chore: clean up * chore: add test * chore: address comments * chore: clean up docs string * chore: clean up test	2024-03-21 12:18:23 -07:00
Anchen	fe96ef342f	feat(mlx-lm): export the GGUF (fp16) format model weights from fuse.py (#555 ) * wip * wip * feat: convert mlx model to gguf f16 * chore: conver norm layer to float32 to avoid overflow issue * chore: add support for mixtral * chore: clean up * chore: remove unused import statement * chore: clean up weight name mapping * version and readme * actual version bump --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-21 10:34:11 -07:00
Anchen	8f906c859a	chore(mlx-lm): enable to apply default chat template (#577 ) * chore(mlx-lm): enable to apply default chat template * Add option to use default chat template * chore: rename the flag to use default chat template	2024-03-20 21:39:39 -07:00
Ivan Fioravanti	d2a99172a6	Add dropout parameter to lora configuration (#599 ) * Add dropout parameter to lora configuration A dropout parameter has been added to the lora configuration settings in lora_config.yaml. The LoRALinear class in utils.py has been updated to take this new parameter. Additionally, a AttributeError: 'types.SimpleNamespace' object has no attribute 'prompt' related to `args.prompt` has been removed from lora.py. * Update lora_config.yaml Set dropout to 0.0 in the sample config file * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-20 08:44:40 -07:00
Anchen	949f63f309	chore(mlx-lm): fix print_trainable_parameters for quant models (#581 ) * chore(mlx-lm): fix print_trainable_parameters for quant models * chore: clean up * refactor: use layer type to check quant bits * chore: address comment	2024-03-20 08:41:03 -07:00
Matt Wronkiewicz	373dd6f2a2	Set finish_reason in response (#592 )	2024-03-19 20:21:26 -07:00
Alwin Arrasyid	6c3d4c8ba2	add dequantize option to mlx_lm/convert.py (#547 )	2024-03-19 19:50:08 -07:00
Chime Ogbuji	6f2fd5daea	Add mlx-lm version information to HF model card (#596 ) * Add mlx-lm version informatiohn to HF model card * Update llms/mlx_lm/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Reverted indentation * Pre-commit formatting --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-19 19:42:03 -07:00
madroid	39d5ca6427	LoRA: report last train info (#595 )	2024-03-19 17:29:50 -07:00
yzimmermann	4680ef4413	Enable more BERT models (#580 ) * Update convert.py * Update model.py * Update test.py * Update model.py * Update convert.py * Add files via upload * Update convert.py * format * nit * nit --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-19 17:21:33 -07:00
madroid	b0bcd86a40	Support for OpenAI’s fine-tuning dataset format (#548 ) * LoRA: move load_dataset to tuner/datasets.py file * LoRA: support OpenAI chat format datasets see https://platform.openai.com/docs/guides/fine-tuning/example-format * LoRA: support OpenAI completion format datasets * LoRA: formatting dataset timing to reduce memory footprint * Refactor dataset item access in PromptCompletionDataset * Update mlx_lm/LORA.md * Update mlx_lm/LORA.md * check Unsupported data format * add tests, fine-tune doc * add tests, fine-tune doc * add jinja2 for chat template * nits in readme * nits in readme --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-19 16:45:46 -07:00
Abdul Fatir	e05e502c34	Fix scaling when embeddings are tied (#591 )	2024-03-18 13:41:07 -07:00
Awni Hannun	e4b19bb9e1	Make attention faster for a some models (#574 ) * make attention faster for a couple models * remove unused generation flags * add comment on lora * include text files as well	2024-03-14 21:35:54 -07:00
Angelos Katharopoulos	3f3741d229	Fix requirements and image2image strength/steps mismatch (#585 )	2024-03-14 12:22:54 -07:00
sweetcard	e2205beb66	Update server.py to add --trust-remote-code to server (#578 ) * Update server.py Add --trust-remote-code to server * format code by running pre-commit --------- Co-authored-by: flymonk <zhou.feng@gsafer.com>	2024-03-14 07:05:19 -07:00
Sugato Ray	2cd793dd69	feat: add update_config functionality (#531 ) * feat: add `update_config` finctionality - sorts the config for better readability - updates "_name_or_path" key in config with upload_repo - sets indentation of 4 spaces - allows adding other key-value pairs via kwargs - reduces code duplication - standardizes config-update across mlx-lm * feat: standardize updating config Impactes: - fuse.py - merge.py * update formatting * remove commented out code * update func: update_config to save_config - drop kwards - rename func as save_config - incorporate review suggestions * update func: save_config - ensure only config-saving functionality - function oes not return config as a dict anymore - added review suggestions * fixed formatting * update formatting instruction in contribution guide * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-14 06:36:05 -07:00
madroid	485180ae91	LoRA: some minor optimizations (#573 ) * init training_args in training scope * Add trainable parameters percentage	2024-03-13 20:26:30 -07:00
madroid	d4e1de1d5b	add peak_memory info to training callback (#572 )	2024-03-13 20:17:10 -07:00
Race	376bb9cc44	bert encoder inherits from nn.Module now (#571 )	2024-03-13 10:24:21 -07:00
Awni Hannun	14fe868825	version (#570 )	2024-03-13 10:09:36 -07:00
Prince Canuma	76c3244cc5	Add support for Cohere's Command-R (#565 ) * initial commit for command-R * update mlp, layernorm, lm_head and model args * add custom layernorm * add default to tie_word_embeddings * add layernorm weight type and refactor * update layernorm (bias conditional) in model/layers * fix layer norm use traditional rope * add test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-13 07:03:36 -07:00
Anchen	3535408c99	chore(mlx-lm): fix tie_word_embeddings for qwen2 (#566 ) * chore: fix tie_word_embeddings for qwen2 * chore: default tie_word_embeddings to True	2024-03-12 21:34:32 -07:00
Awni Hannun	39084e81c2	Some improvements to LoRA (#528 ) * set cache_limit * remove set cache_limit * cleanup * add gradient checkpointing * fix sort * mokey patch call for checkpoint * fix example config	2024-03-12 20:02:03 -07:00
Chime Ogbuji	e56d9015ef	LoRA on all linear transformer block layers (#546 ) * Add --lora-all-linear option to apply LoRa to all linear transfer block layers * Moved to YAML config and added specification of rank & alpha * nits in conifg, more tests * nit * run tests for prs --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-12 07:37:40 -07:00
devonthomas35	fe5edee360	Fix image2image for SDXL (#563 ) --------- Co-authored-by: Angelos Katharopoulos <katharas@gmail.com>	2024-03-11 12:18:47 -07:00
zweifisch	d0fa6cfcae	feat: stable-diffusion t2i add --seed (#558 )	2024-03-10 06:12:54 -07:00
Awni Hannun	ad3cf5ed98	dropout 0 as default (#549 )	2024-03-08 13:07:10 -08:00
Angelos Katharopoulos	3a9e6c3f70	Stable diffusion XL (#516 )	2024-03-08 10:24:19 -08:00
Chime Ogbuji	8c2cf665ed	YAML configuration for mlx_lm.lora (#503 ) * Convert mlx_lm.lora to use YAML configuration * pre-commit run fixes * Fix loading of config file * Remove invalid YAML from doc * Update command-line options and YAML parameter overriding, per feedback in #503 * Minor wording change * Positional argument * Moved config to a (-c/--config) flag * Removed CLI option defaults (since CLI options take precedence and their defaults are in CONFIG_DEFAULTS) * pre-commit format updates * Fix handling of CLI option defaults * Prevent None values of unspecified CLI options from overwriting values from CONFIG_DEFAULTS * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-08 07:57:52 -08:00
Awni Hannun	8b05bb6d18	[mlx-lm] Use sdpa in llama / mistral model (#515 ) * use sdpa * update a few more models * version * fix stablelm type	2024-03-07 17:41:23 -08:00
Awni Hannun	7cdd1b69ac	Enable unit testing in Circle and start some MLX LM tests (#545 ) * add a few tests for mlx lm * add a few tests for mlx lm * add a few tests for mlx lm * more tests / cleanup	2024-03-07 09:31:57 -08:00
amcox886	ef32379bc6	Update README.md (#530 ) * Update README.md The default behaviour of where the convert.py saved files was wrong. It also was inconsistent with how the later script test.py is trying to use them (and assuming naming convention). I don't actually see a quick way to automate this since--as written--the target directory is set directly by an argument. It would probably be best to rewrite it so that the argument is used as an override variable, but the default behaviour is to construct a file path based on set and unset arugments. This also is complex because "defaults" are assumed in the naming convention as well. * Update README.md Created an actual script that'll run and do this correctly. * Update README.md Typo fix: mlx-models should have been mlx_models. This conforms with standard later in the mlx-examples/whisper code. * Update README.md Removed the larger script and changed it back to the simpler script as before. * nits in readme --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-07 06:23:43 -08:00
Anchen	8a178f8716	chore: enable tie_word_embeddings config for qwen2 (#544 )	2024-03-07 06:11:35 -08:00
Y4hL	b8e5eda4fd	Refactoring of mlx_lm example (#501 ) * Use named tuple from typing for typehints * Add type hints * Simplify expression * Type hint fix * Improved do_POST logic Use a map of endpoints to methods to reduce redundancy in code * Fix format * Improve redundancy Call method dynamically instead of writing out all arguments twice * Send response instead of returning * Fix typo * Revert change * Make adapter_file as Optional * Mark formatter as optional * format * Create message generator Store response data that stays static for the duration of the response inside of the object: system_fingerprint request_id object_type requested_model Created a message generator, that dynamically creates messages from the metadata stored inside of the object, and the data from the model pipeline * Remove leftover * Update parameters to reflect new object structure No longer pass all arguments between functions, but use the stores values inside of the object * Parse body before calling request specific methods * Call super init * Update server.py * Fixed outdated documentation parameter name * Add documentation * Fix sending headers twice During testing I found that when using the streaming option, headers have always been sent twice. This should fix that * Simplify streaming code by using guard clauses Don't wrap wfile writes in try blocks, the server class has its own try block to prevent crashing * Bug fix * Use Content-Length header Let the completion type specific methods finish sending the headers. This allows us to send the Content-Length header as the model returns a completion. * Update utils.py * Add top_p documentation * Type hint model and tokenizer as required * Use static system fingerprint System fingerprint now stays the same across requests * Make type hint more specific * Bug Fix Supplying less than 2 models to merge would raise ValueError and calls len on unbound "models". Should be "model_paths" instead. Mark upload_repo as optional * Move more of the shared code into do_POST Processing stop_id_sequences is done no matter the request endpoint or type, move it into the shared section. handle_ methods now just return the prompt in mx.array form. * Store stop_id_sequences as lists instead of np During testing I found that letting the tokenizer return values as python lists and converting them to mlx arrays was around 20% faster than having the tokenizer convert them to np, and from np to mlx. This allows makes it so numpy no longer needs to be imported. * Update stop_id_sequences docs * Turn if check to non-inclusive Only continue if buffer is smaller * Documentation fix * Cleared method names Instead of handle_stream and generate_competion, we should name it handle_completion. Instead of handle_completions and handle_chat_completions, we should name it handle_text_completions, since both are completions, calling it text completions should make it more descriptive * Make comment clearer * fix format * format	2024-03-06 06:24:31 -08:00
Madroid Ma	710c552731	add huggingface repo url print (#534 )	2024-03-05 21:51:31 -08:00
Muhtasham Oblokulov	5de7c2ac33	Add tips on porting LLMs from HuggingFace (#523 ) * Add tips on porting LLMs from HuggingFace * Add CONTRIBUTING.md to mlx-examples-llms * Refactor imports and update comment in starcoder2.py * Update llms/mlx_lm/models/starcoder2.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * nits * nits --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-05 17:43:15 -08:00
Prince Canuma	3fdf85e79d	Starcoder2: Update config and change GQA to use repeat (#520 ) * update config * change gqa to use repeat instead of concante * contribution	2024-03-03 06:12:03 -08:00
Anchen	1e3daea3bb	chore(mlx-lm): add missing model_type for starcoder2 (#522 )	2024-03-03 06:07:45 -08:00
Anchen	3655bfc3bd	chore(mlx-lm): fix broken server.py script (#519 )	2024-03-03 06:04:54 -08:00
Muhtasham Oblokulov	81e2a80026	Add Starcoder 2 (#502 ) * Add Starcoder2 model and update utils.py * Refactor model arguments and modules in starcoder2.py * Refactor FeedForward class to MLP in starcoder2.py * Fix typo * pre-commit * Refactor starcoder2.py: Update model arguments and modules * Fix LM head and MLP layers * Rename input layer norm * Update bias in linear layers * Refactor token embeddings in Starcoder2Model * Rename to standard HF attention layer name * Add LayerNorm * Add transposed token embeddings (like in Gemma) * Refactor MLP and TransformerBlock classes * Add tie_word_embeddings option to ModelArgs and update Model implementation * Add conditional check for tying word embeddings in Starcoder2Model * Fix bias in lm_head linear layer * Remove unused LayerNorm in stablelm * Update transformers dependency to use GitHub repository * fix lm head bug, revert transformer req * Update RoPE initialization in Attention class --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-02 19:39:23 -08:00

1 2 3 4 5 ...

386 Commits