Commit Graph

  • 9b83004631 Faster sampling with mx.compile (#937) Awni Hannun 2024-08-15 11:29:09 -07:00
  • 95840f32e2 Fix whipser conversion for safetensors models (#935) Awni Hannun 2024-08-14 10:22:04 -07:00
  • 33905447f9 Whisper updates to allow HF models (#923) Awni Hannun 2024-08-09 11:11:58 -07:00
  • df744c98e6 Predict stop sequence matches during streaming (#541) tidely 2024-08-07 01:24:15 +03:00
  • 8fa12b0058 Adapters loading (#902) Khush Gupta 2024-08-01 16:18:18 -07:00
  • 85dc76f6e0 Server: support stream_options (#913) madroid 2024-07-26 23:58:52 +08:00
  • 46da74fea2 Unify attention mask in LLMs (#911) otriscon 2024-07-25 19:45:22 -04:00
  • 7a3ab1620a support load model by custom get_model_classes (#899) Anchen 2024-07-26 04:01:17 +10:00
  • cd8efc7fbc Add support for Llama-3.1 (#907) Alex Cheema 2024-07-23 13:21:32 -07:00
  • 47060a8130 refactor: add force_download parameter to get_model_path function (#800) M. Ali Bayram 2024-07-23 23:10:20 +03:00
  • 3f337e0f0a Add Mistral NeMo (fix) (#895) Prince Canuma 2024-07-22 15:09:24 +02:00
  • 3d365b612a Add support for InternLM-2.5 (#871) Prince Canuma 2024-07-18 01:38:22 +02:00
  • 561dcf5643 Add support for deepseek coder v2 lite (#882) Anchen 2024-07-18 00:23:28 +10:00
  • f0c6c6e226 keep the server in a valid state (#889) Awni Hannun 2024-07-15 18:35:36 -07:00
  • bfc1f2763b longrope (#886) JosefAlbers 2024-07-12 23:19:11 +09:00
  • 8bf397e450 Pass use_dora parameter to linear_to_lora_layers (#885) Chime Ogbuji 2024-07-11 17:34:34 -04:00
  • fbe3247772 Add GPT-neox model (#863) nicolov 2024-07-11 15:13:17 +02:00
  • 9717307ff0 Validation with full data set, results in NaN validation score (#879) James A Capozzoli 2024-07-10 11:36:11 -04:00
  • 63800c8feb Example of response generation with optional arguments (#853) Alex Wozniakowski 2024-07-09 06:49:59 -07:00
  • 68e88d42fb Fix server for openai package (#877) Awni Hannun 2024-07-08 12:34:31 -07:00
  • 20e221f7f7 Add recurrent gemma (#856) Awni Hannun 2024-07-07 12:10:04 -07:00
  • 1e05aef344 Add logit soft capping to gemma, and fix precision issues (#857) n8programs 2024-07-02 10:52:39 -04:00
  • f212b770d8 Server loads the model on demand from the request (#851) Angelos Katharopoulos 2024-06-27 11:37:57 -07:00
  • 538339b599 gemma2 (#855) Awni Hannun 2024-06-27 10:06:28 -07:00
  • 9f10728145 fix yi (#852) Awni Hannun 2024-06-27 06:38:19 -07:00
  • 7979b84a9e transformer_lm: add --dataset enwik8 (#838) Volodymyr Kyrylov 2024-06-26 20:59:01 +02:00
  • df6bc09d74 Configuration-based use of HF hub-hosted datasets for training (#701) Chime Ogbuji 2024-06-26 13:20:50 -04:00
  • 1d701a1831 Logprobs info to completion API (#806) Chime Ogbuji 2024-06-23 13:35:13 -04:00
  • a7598e9456 Fix mypy errors with models/{qwen2,qwen2_moe,startcoder2}.py (#835) Yi Wang 2024-06-14 09:44:50 -07:00
  • 97939cc86e nits openlm Awni Hannun 2024-06-13 07:47:56 -07:00
  • 7c6ced183d openlm Awni Hannun 2024-06-13 07:47:16 -07:00
  • d8b073e3a7 Add eos token to lora fine-tunes (#818) Awni Hannun 2024-06-12 07:44:21 -07:00
  • 3cc58e17fb Tweaks to run dspy-produced calls to the server, with gemma template. (#810) Nada Amin 2024-06-12 10:17:06 -04:00
  • 6da07fb1b0 make models/phi3.py and models/phi3small.py compatible with mypy (#833) Yi Wang 2024-06-12 06:53:55 -07:00
  • fda41545a6 Su-RoPE(Rotary Position Embedding) for Phi-3 (#813) JosefAlbers 2024-06-11 22:20:04 +09:00
  • a54dfd698e Correct the type annotation of cache in llama.py (#828) Yi Wang 2024-06-10 15:18:34 -07:00
  • bb8227f181 Correct type annotation of llama.ModelArgs.num_key_value_heads (#827) Yi Wang 2024-06-10 14:47:31 -07:00
  • c5da302fc4 gpu featurization (#824) Awni Hannun 2024-06-07 08:59:44 -07:00
  • 4872727f14 Fixing "NameError: name 'resume_adapter_file' is not defined" (#817) Robin Glauser 2024-06-05 19:07:31 +02:00
  • 43d6deb3c1 mlx_lm: Add Streaming Capability to Generate Function (#807) Michał Kurc 2024-06-03 18:04:39 +02:00
  • 8353bbbf93 Segment Anything Model (#552) Shiyu 2024-06-03 07:45:51 +08:00
  • 89b0b75250 GPT2 Support (#798) Derek Lewis 2024-06-02 16:33:20 -07:00
  • c457a3f88b LoRA: Extract small function (#614) madroid 2024-06-02 21:38:42 +08:00
  • 81318ad4a8 Port of phi3small (#794) Awni Hannun 2024-05-31 12:54:14 -07:00
  • 09aaeac72c fix moe conversion (#802) Awni Hannun 2024-05-31 12:36:05 -07:00
  • f49c5f2829 fixed the requirements (#803) Behnam Moh 2024-05-29 09:14:19 -04:00
  • aac98ca6f4 support internlm2 (#797) Chen Xin 2024-05-27 21:22:21 +08:00
  • ca7ce60c91 Rename block sparse to gather (#793) Awni Hannun 2024-05-23 19:47:35 -07:00
  • 69700d8431 Add support for Phi-3 Medium (#790) Prince Canuma 2024-05-23 01:47:06 +02:00
  • b044ce2acf Add support for ibm granite (#758) Prince Canuma 2024-05-22 05:16:31 +02:00
  • 9fc6efbd90 version bump + some fixes (#792) Awni Hannun 2024-05-21 20:09:35 -07:00
  • 9f671228cd Block sparse MM MoEs (#782) Angelos Katharopoulos 2024-05-21 15:58:08 -07:00
  • 199df9e110 fix: Added dedicated error handling to load and get_model_path (#775) AtakanTekparmak 2024-05-20 15:39:05 +02:00
  • e92de216fd rid warning (#789) Awni Hannun 2024-05-20 06:05:33 -07:00
  • 42458914c8 support dora finetune in mlx-examples/llms/mlx_lm (#779) alexC-nonsense4k 2024-05-16 23:21:26 +08:00
  • 69181e0058 Support non incremental kv cache growth (#766) Awni Hannun 2024-05-15 12:56:24 -07:00
  • 1a86d985d9 Support --add_eos_token argument within Lora training (#760) Jinwu Zhan 2024-05-14 08:17:42 +08:00
  • 10853b57d9 Add model_config parameter to load() and load_model() (#770) JosefAlbers 2024-05-11 02:13:34 +09:00
  • 6f0a69e682 fix lora for openelm (#773) Awni Hannun 2024-05-10 09:51:41 -07:00
  • fad9598372 Fix llama cache check (#763) Awni Hannun 2024-05-08 08:35:54 -07:00
  • ee60e2a9d5 Kv cache (#643) Awni Hannun 2024-05-08 08:18:13 -07:00
  • bfbc0e434a Add optional EOS token for llava example (#753) Albert Avetisian 2024-05-08 09:04:36 -04:00
  • c0019c4908 Pad mask with zeros for non-square attention matrices (#715) Kevin Wang 2024-05-04 19:32:25 -04:00
  • f30413b63c chore(mlx-lm): fix the number of validation batches configuration. (#752) Anchen 2024-05-04 23:52:42 +10:00
  • 2bf11c4633 Use stable url for MNIST (#749) Awni Hannun 2024-05-03 17:13:05 -07:00
  • d1c35fa684 Add MLX Cache Limit setting for mlx_lm.generate and mlx_lm.server CLI (#744) Konstantin Kerekovski 2024-05-03 15:42:48 -04:00
  • b468091f7f Add model management functionality for local caches (#736) Ivan Fioravanti 2024-05-03 21:20:13 +02:00
  • 92430df0a0 Fix lora for qwen moe (#743) Awni Hannun 2024-05-02 21:55:09 -07:00
  • 5079af62db Update model card describe (#654) madroid 2024-05-03 12:22:04 +08:00
  • 6775d6cb3f Whisper: Add pip distribution configuration to support pip installations. (#739) madroid 2024-05-02 00:00:02 +08:00
  • 4bf2eb17f2 Validate server params & fix logit bias bug (#731) Karim Elmaaroufi 2024-04-30 07:27:40 -07:00
  • 7c0962f4e2 Add Supported Quantized Phi-3-mini-4k-instruct gguf Weight (#717) Jaward Sesay 2024-04-30 11:11:32 +08:00
  • 5513c4e57d Fixes Typo in Starcoder2 (#740) Thomas Lazarus 2024-04-29 15:14:45 -05:00
  • 510d2bde49 Force multi_commits when uploading to HF (#729) Javier de la Rosa 2024-04-29 04:07:17 +02:00
  • 699de35b03 Update lora_config.yaml (#735) 锦此 2024-04-29 01:24:34 +08:00
  • c012eb173f Add support for OpenELM (#719) Prince Canuma 2024-04-26 01:49:28 +02:00
  • 2c1c9e9024 MiniCPM implementation (#685) Gökdeniz Gülmez 2024-04-26 00:29:28 +02:00
  • 685012c2ad Couple fixes for LoRA (#711) Awni Hannun 2024-04-25 14:16:13 -07:00
  • 109ee2f2f8 Use CORS headers for streaming for MLX Server (#716) Kristian Muñiz 2024-04-25 10:26:04 -04:00
  • 8a265f0d54 Fix incorrect type annotation (#720) Kevin Wang 2024-04-24 18:52:43 -04:00
  • abcd891851 Add support for phi-3 (#712) Prince Canuma 2024-04-23 18:20:00 +02:00
  • ecbc6ff1e3 one more quant fix (#708) Awni Hannun 2024-04-22 18:12:52 -07:00
  • 8d5cf5b0c8 use logging in mlx server (#705) Aaron Ng 2024-04-22 07:50:06 -07:00
  • f20e68fcc0 Load fused model with transformers (#703) AlexandrosChrtn 2024-04-21 19:04:44 +03:00
  • 749cabf299 fix: unicode decoding (#702) Anchen 2024-04-22 01:58:23 +10:00
  • 1484598de1 Add support for logit bias (#697) Karim Elmaaroufi 2024-04-21 06:53:56 -07:00
  • 6abdbe3be8 Fix quant in gguf (#698) Awni Hannun 2024-04-19 20:07:11 -07:00
  • 574ad7f6fe fix dequantization (#693) Awni Hannun 2024-04-19 10:46:59 -07:00
  • 2146bcd7ee Quantize embedding / Update quantize API (#680) Awni Hannun 2024-04-18 18:16:10 -07:00
  • f5f189e48a fix(mlx-lm): broken server.py (#690) Anchen 2024-04-19 07:26:18 +10:00
  • 35206806ac Create executables for generate, lora, server, merge, convert (#682) Phúc H. Lê Khắc 2024-04-17 00:08:49 +01:00
  • 7d7e236061 - Removed unused Python imports (#683) dmdaksh 2024-04-16 10:50:32 -04:00
  • e55a9e8cb4 Add an SPM detokenizer that doesn't trim initial space (#681) Angelos Katharopoulos 2024-04-15 14:15:25 -07:00
  • d3f8e4aee9 Fix argpartition call in Mixtral and other MOES (#676) Awni Hannun 2024-04-12 11:00:56 -07:00
  • 9c5554d8ee Use async eval (#670) Awni Hannun 2024-04-11 13:18:23 -07:00
  • 0250f6f38e feat: Update black-pre-commit-mirror to version 24.3.0 (#675) Nripesh Niketan 2024-04-11 18:28:26 +04:00
  • 9f472dc985 Update transformers for ⌘-R+ (#668) devonthomas35 2024-04-11 07:28:12 -07:00
  • 5a4cad34ef Always resume downloads (#674) da-z 2024-04-11 15:52:32 +02:00
  • eff6690952 Fix CFG for SDXL (#667) Angelos Katharopoulos 2024-04-09 06:06:41 -07:00
  • 1278994b56 Add streaming detokenizers (#651) Angelos Katharopoulos 2024-04-08 22:36:01 -07:00