mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-12-16 02:08:55 +08:00

Author	SHA1	Message	Date
Shubbair	1e386b5c20	Updating GAN Code...	2024-07-30 02:56:13 +03:00
Shubbair	7438b54ecd	Updating GAN Code...	2024-07-30 02:44:41 +03:00
Shubbair	7fea34d65e	Updating GAN Code...	2024-07-30 02:37:09 +03:00
Shubbair	f505fe6e55	Updating GAN Code...	2024-07-30 02:17:12 +03:00
Shubbair	4e80759b39	Updating GAN Code...	2024-07-30 02:06:52 +03:00
Shubbair	306e53c402	Updating GAN Code...	2024-07-29 19:44:16 +03:00
Shubbair	bacaa9ec0e	Updating GAN Code...	2024-07-29 01:30:08 +03:00
Shubbair	8d27be1442	Updating GAN Code...	2024-07-29 01:24:50 +03:00
Shubbair	4de0583b49	Updating GAN Code...	2024-07-28 19:18:35 +03:00
Shubbair	a07ef6d03b	Updating GAN Code...	2024-07-28 18:11:39 +03:00
Shubbair	c0c8293842	Updating GAN Code...	2024-07-28 17:56:26 +03:00
Shubbair	d17d293df9	Updating GAN Code...	2024-07-28 17:35:36 +03:00
Shubbair	3e63cd93fe	Updating GAN Code...	2024-07-28 17:26:24 +03:00
Shubbair	3716501e8d	Updating GAN Code...	2024-07-28 17:22:40 +03:00
Shubbair	88a20b7276	Updating GAN Code...	2024-07-28 01:10:19 +03:00
Shubbair	8b1713737a	Updating GAN Code...	2024-07-27 01:20:00 +03:00
Shubbair	f8b7094fb8	Updating GAN Code...	2024-07-27 01:19:50 +03:00
Shubbair	147cb3d2bc	Updating GAN Code...	2024-07-27 01:09:51 +03:00
Shubbair	a05608c34d	Updating GAN Code...	2024-07-27 00:22:29 +03:00
Shubbair	f176cce74d	Updating GAN Code...	2024-07-27 00:19:08 +03:00
madroid	85dc76f6e0	Server: support stream_options (#913 ) * Server: support stream_options see https://x.com/OpenAIDevs/status/1787573348496773423 * Server: support stream_options * Server: check None type	2024-07-26 08:58:52 -07:00
Shubbair	959c623908	Updating GAN Code...	2024-07-26 16:38:55 +03:00
Shubbair	591074bea8	Updating GAN Code...	2024-07-26 16:36:29 +03:00
Shubbair	d426586b03	Updating GAN Code...	2024-07-26 16:07:40 +03:00
otriscon	46da74fea2	Unify attention mask in LLMs (#911 ) * Unify attention mask creation in LLMs. Currently, each model implementation in `mlx-examples/llms/models` has ad-hoc code to create a mask for the attention mechanism. This usually takes the form: ``` mask = None if h.shape[1] > 1: mask = nn.MultiHeadAttention.create_additive_causal_mask(h.shape[1]) mask = mask.astype(h.dtype) ``` This correctly creates a mask only if the input consists of more than one token. But this code assumes the multi-token input is at the beginning of inference. If, for example, we are evaluating multiple tokens because of speculative decoding or prompt cache reuse, this mask will not have the correct shape and and will cause the raising of an exception in the attention computation. Some of the models correctly implement the mask creation with code like this: ``` mask = None if h.shape[1] > 1: mask = create_additive_causal_mask( h.shape[1], cache[0].offset if cache is not None else 0 ) mask = mask.astype(h.dtype) ``` This commit unifies the attention mask creation for all models with a new function `create_attention_mask`, reducing code duplication and helping all models support inference performance enhancements like those mentioned above. * Allow batches in LLM key-value cache The current implementation of the LLM key-value cache assumes that the input batch is of size 1. Input batching (evaluating multiple alterative inputs at the same time) can be a valuable tool for speculative sampling and other techniques. This change removes the hard-coded batch size from the code that resizes the key-value cache. * Simplify causal mask creation Use the same codepath regardless of whether there's an offset or not. Addresses [this comment](https://github.com/ml-explore/mlx-examples/pull/911#discussion_r1691459717). * Use old-style type annotation to avoid linter error	2024-07-25 16:45:22 -07:00
Anchen	7a3ab1620a	support load model by custom get_model_classes (#899 ) * feature(mlx_lm): support load model by custom get classes * rename the param	2024-07-25 11:01:17 -07:00
Shubbair	5e7ce1048c	Add GAN model 25/7	2024-07-25 21:00:41 +03:00
Alex Cheema	cd8efc7fbc	Add support for Llama-3.1 (#907 ) * add dynamicNTK scaling rope * remove unused var * fix rope base * llama3.1 fixes * TODO for rope eval * vectorise llama3 base freq calculation * removed the arbitrary 2.0 rope_scale default case * fix slow llama3.1 generation by evaluating stateless part of DynamicNTKScalingRoPE in init * nits + format * use mx.pi * fix tests and add test for 3.1 --------- Co-authored-by: Prince Canuma <prince.gdt@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-07-23 13:21:32 -07:00
M. Ali Bayram	47060a8130	refactor: add force_download parameter to get_model_path function (#800 )	2024-07-23 13:10:20 -07:00
Prince Canuma	3f337e0f0a	Add Mistral NeMo (fix) (#895 ) * fix head_dim * Update llms/mlx_lm/models/llama.py * fix kv error * formatting * Delete test.py --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-07-22 06:09:24 -07:00
Prince Canuma	3d365b612a	Add support for InternLM-2.5 (#871 ) * fix internlm-2 * formatting * add dynamic ntk rope * formatting * move dynamic scaling rope to intermlm2.py * add default max_position_embeddings	2024-07-17 16:38:22 -07:00
Anchen	561dcf5643	Add support for deepseek coder v2 lite (#882 ) * feat: add support for deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct * fix softmax + some cleanup * more nits * fix rope * fix original_max_position_embeddings in rope * fix original_max_position_embeddings in rope config * add group greedy --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-07-17 07:23:28 -07:00
Awni Hannun	f0c6c6e226	keep the server in a valid state (#889 )	2024-07-15 18:35:36 -07:00
JosefAlbers	bfc1f2763b	longrope (#886 )	2024-07-12 07:19:11 -07:00
Chime Ogbuji	8bf397e450	Pass use_dora parameter to linear_to_lora_layers (#885 )	2024-07-11 14:34:34 -07:00
nicolov	fbe3247772	Add GPT-neox model (#863 )	2024-07-11 06:13:17 -07:00
James A Capozzoli	9717307ff0	Validation with full data set, results in NaN validation score (#879 ) * CLI arguments may set num_batches to -1 The CLI arguments allow you to validate with the entire dataset by passing a negative one value, but this quickly results in a division by zero `NaN` to appear as the validation score! * Must properly assemble the mini batches when validating with entire dataset. Tested locally, a validation of a novel took about an hour, with a loss of 0.928. Thanks @awni for the correction! * Set up the pre-commit hooks and run them so that black may format lora.py.	2024-07-10 08:36:11 -07:00
Alex Wozniakowski	63800c8feb	Example of response generation with optional arguments (#853 ) * Generate response with optional arguments * Reference response generation example * Include transformers and sentencepiece * Update example to run Mistral-7B-Instruct-v0.3 * Link to generation example * Style changes from pre-commit	2024-07-09 06:49:59 -07:00
Awni Hannun	68e88d42fb	Fix server for `openai` package (#877 ) * fix * fixes for 9b	2024-07-08 12:34:31 -07:00
Awni Hannun	20e221f7f7	Add recurrent gemma (#856 ) * add recurrent gemma * fix window cache	2024-07-07 12:10:04 -07:00
n8programs	1e05aef344	Add logit soft capping to gemma, and fix precision issues (#857 ) * Add logit soft capping to gemma, and fix precision issues Gemma was babbling nonsense - so I figured out it was due to not having logit softcapping and precision issues causing NaNs (so I implemented the softcapping and added more float32 inference). gemma-27b-it-4bit now works flawlessly (or near-flawlessly, no sliding-window attention). * get rid of comments * get rid of last comments (sry lol) * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-07-02 07:52:39 -07:00
Angelos Katharopoulos	f212b770d8	Server loads the model on demand from the request (#851 )	2024-06-27 11:37:57 -07:00
Awni Hannun	538339b599	gemma2 (#855 )	2024-06-27 10:06:28 -07:00
Awni Hannun	9f10728145	fix yi (#852 )	2024-06-27 06:38:19 -07:00
Volodymyr Kyrylov	7979b84a9e	transformer_lm: add --dataset enwik8 (#838 ) * transformer_lm: add --dataset enwik8 * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-26 11:59:01 -07:00
Chime Ogbuji	df6bc09d74	Configuration-based use of HF hub-hosted datasets for training (#701 ) * Add hf_dataset configuration for using HF hub-hosted datasets for (Q)LoRA training * Pre-commit formatting * Fix YAML config example * Print DS info * Include name * Add hf_dataset parameter default * Remove TextHFDataset and CompletionsHFDataset and use Dataset and CompletionsDataset instead, adding a text_key constructor argument to the former (and changing it to work with a provided data structure instead of just from a JSON file), and prompt_key and completion_key arguments to the latter with defaults for backwards compatibility. * nits * update docs --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-26 10:20:50 -07:00
Chime Ogbuji	1d701a1831	Logprobs info to completion API (#806 ) * Initial implementation * Fix handling of return_step_logits in return * Fixed OpenAI parameter expectations and logprob structure and datatypes * pre-commit black formatting * Remove unused parameter * fix log probs * fix colorize * nits in server * nits in server * Fix top_logprobs structure (a dict) and include tokens in logprobs response * nits * fix types --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-23 10:35:13 -07:00
Yi Wang	a7598e9456	Fix mypy errors with models/{qwen2,qwen2_moe,startcoder2}.py (#835 ) * Fix starcoder.py * Fix qwen2 * Remvoe unnecessary assert not None	2024-06-14 09:44:50 -07:00
Awni Hannun	d8b073e3a7	Add eos token to lora fine-tunes (#818 ) * add eos token to lora fine-tunes * Comment	2024-06-12 07:44:21 -07:00
Nada Amin	3cc58e17fb	Tweaks to run dspy-produced calls to the server, with gemma template. (#810 ) * Tweaks to run dspy-produced calls to the server, with gemma template. following comment https://github.com/stanfordnlp/dspy/issues/385#issuecomment-1998939936 can try it out with: ```sh python -m server --model mlx-community/gemma-1.1-7b-it-4bit --port 1143 ``` modulo patching the relative imports in server.py ``` -from .tokenizer_utils import TokenizerWrapper -from .utils import generate_step, load +from mlx_lm.tokenizer_utils import TokenizerWrapper +from mlx_lm.utils import generate_step, load ``` and then, ont the dspy side: ```python import dspy lm = dspy.OpenAI(model_type="chat", api_base="http://localhost:11434/v1/", api_key="not_needed", max_tokens=250) lm("hello") ``` * simpler way to validate float or int * remove logic that works around incompatible templates, too gemma specific * tweak messages for common denominator * use generate.py workaround for DBXR * put behind flag * oops * Solution to chat template issue: pass in a custom template! The template should likely adhere to the OpenAI chat model. Here is such a template for Gemma. --chat-template "{{ bos_token }}{% set extra_system = '' %}{% for message in messages %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{% if role == 'system' %}{% set extra_system = extra_system + message['content'] %}{% else %}{% if role == 'user' and extra_system %}{% set message_system = 'System: ' + extra_system %}{% else %}{% set message_system = '' %}{% endif %}{{ '<start_of_turn>' + role + '\n' + message_system + message['content'] \| trim + '<end_of_turn>\n' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}" * remove convoluted solution * Tweak for when None is provided explicitly, and must be set to [] too. For example, the outlines library provides None explicitly. * style --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-12 07:17:06 -07:00

... 3 4 5 6 7 ...

707 Commits