Commit Graph

  • 2ba0e36683 [mlx-lm] Use top p in server (#1144) Awni Hannun 2024-12-12 11:12:21 -08:00
  • 19abf3dcaa Replace unicode errors instead of raising exception (#1146) Angelos Katharopoulos 2024-12-12 11:10:41 -08:00
  • 06af3c9b0e Add finish_reason in GenerationResponse (#1153) madroid 2024-12-13 02:37:40 +08:00
  • 77b42b7c8b fix llava (#1149) Awni Hannun 2024-12-12 10:37:26 -08:00
  • 135c5818c1 Fix max_tokens (#1148) Alex Barron 2024-12-10 11:26:04 -08:00
  • 12083c4b7e Support for multiple EOS tokens (#1141) madroid 2024-12-10 00:53:58 +08:00
  • 5687d5b99b Adds EXAONE architecture. (#1145) n8programs 2024-12-09 10:58:25 -05:00
  • 893b3f085e Change Flux default max_shift to 1.15 to match the official one (#1137) hehua2008 2024-12-09 15:29:48 +08:00
  • ed91bbc4dc Fix final message at end of flux training (#1143) Peter Sibley 2024-12-09 02:01:53 -05:00
  • 1fd6aae871 Fix flux training with batch size (#1135) hehua2008 2024-12-09 14:09:04 +08:00
  • 2211b27388 Mixed Quantizations (#1132) Alex Barron 2024-12-08 14:21:50 -08:00
  • cd8cf28c39 mlx_lm.evaluate (#1140) Alex Barron 2024-12-08 12:20:10 -08:00
  • 64ceb62674 load q4_k_m inefficiently load-gguf Alex Barron 2024-12-03 19:54:57 -08:00
  • 1727959a27 Add mentions of MLX-my-repo. (#1129) vb 2024-12-04 04:21:39 +01:00
  • 1963df8565 Allow prompt callback to generate_step (#1133) Awni Hannun 2024-12-03 16:17:14 -08:00
  • 0ca162cfb2 Fix data_iter in prepare_dataset from speechcommands example (#1113) sakares saengkaew 2024-12-03 14:56:07 +07:00
  • eb9277f574 Allow loading from diffusers ckpt (#1117) Angelos Katharopoulos 2024-12-02 13:15:50 -08:00
  • 2a9294a5f0 Fix bug in FluxSampler.timesteps method (#1131) hehua2008 2024-12-03 05:15:19 +08:00
  • 8801beb66f Add olmo2 (#1128) Awni Hannun 2024-12-02 11:42:58 -08:00
  • cefe793ae0 Accept mx.array type for prompt argument for stream_generate (#1125) Neil Mehta 2024-11-26 19:51:55 -05:00
  • cfc29c29f4 Put prompt processing in same stream (#1122) Awni Hannun 2024-11-25 09:47:00 -08:00
  • a5e173802e docs: update stream_generate return type annotation (#1121) madroid 2024-11-26 00:10:14 +08:00
  • adaab81029 Allow converting models from local directories (#1118) Remixer Dec 2024-11-25 04:41:06 +04:00
  • 0ffdb6dd20 Fix object property value in mlx_lm.server chat completions response to match OpenAI spec (#1119) Kevin Conner 2024-11-24 16:37:37 -08:00
  • 0f135396ae Generation refactor: part 2 (#1099) Awni Hannun 2024-11-23 11:47:06 -08:00
  • 004eb4cc9d Tencent HunYuan MOE model (#1100) Awni Hannun 2024-11-23 11:06:26 -08:00
  • 042280ce50 Fix format (#1115) Angelos Katharopoulos 2024-11-20 16:15:53 -08:00
  • 60c7b80350 Pass seed to sd img2img (#1114) Valentin Roussellet 2024-11-20 15:21:52 -08:00
  • bd6d910ca3 [MLX LM] Fix f-string formatting in memory warning message (#1105) Alban Lecocq 2024-11-13 15:14:03 +01:00
  • 1e07660184 FLUX: save train config (#1049) madroid 2024-11-09 09:15:19 +08:00
  • 657b4cc0aa [MLX LM] Sampler refactor + a few improvements (#1094) Awni Hannun 2024-11-07 16:15:24 -08:00
  • ed9e81dd58 Fix rotating kv cache size (#1093) Angelos Katharopoulos 2024-11-05 10:24:24 -08:00
  • 6fd1f70f73 fix spm decoder multi-byte (#1092) Awni Hannun 2024-11-05 06:06:26 -08:00
  • 4394633ce0 mlx_whisper: add support for audio input from stdin (#1012) Anthony Wu 2024-11-04 14:02:13 -08:00
  • 3b526f0aa1 Add support for falcon-mamba (#1074) ilyasch2 2024-11-05 00:23:30 +04:00
  • 82e3338987 chore(mlx-lm): add max token arg for mlx_lm.chat (#1089) Anchen 2024-11-04 22:06:34 +08:00
  • 331148d8ec Enable distributed LoRA training (#821) Angelos Katharopoulos 2024-11-02 18:02:31 -07:00
  • 29c954f4cb fix (#1082) Awni Hannun 2024-11-02 13:51:38 -07:00
  • 0f799947d0 fix (#1079) Awni Hannun 2024-11-01 16:30:32 -07:00
  • e510987870 Clear cache every now and then (#1081) Awni Hannun 2024-11-01 14:15:32 -07:00
  • 8160e0c4e5 Whisper improvements (#1080) Awni Hannun 2024-11-01 10:52:28 -07:00
  • 85ffd2c96a Quantized KV Cache (#1075) Alex Barron 2024-10-31 16:59:52 -07:00
  • 9f34fdbda4 Wire models in MLX LM (#1069) Awni Hannun 2024-10-31 08:17:14 -07:00
  • 8fe9539af7 Fix detokenizer space match for quote (#1072) Awni Hannun 2024-10-27 15:06:07 -07:00
  • ab4bf05c6e Update lora_config.yaml with new param: num_layers (#1068) hschaeufler 2024-10-26 19:34:46 +03:00
  • 67607a8e13 Start memory-efficient flux finetuning branch flux-qlora Angelos Katharopoulos 2024-10-25 09:46:47 -07:00
  • 4971462bf0 feat(clip): add linear probe evaluation script (#960) Saurav Maheshkar 2024-10-25 05:56:17 +01:00
  • 9000e280ae fix mamba models conversion (#1065) Awni Hannun 2024-10-22 15:44:08 -07:00
  • d1d480867b LoRA: update tools datasets docs (#1063) madroid 2024-10-23 03:19:11 +08:00
  • 66e7bcb886 override dtype with quant (#1062) Awni Hannun 2024-10-22 09:56:45 -07:00
  • 743763bc2e Handle empty string case in maybe_trim_space (#1055) aronson 2024-10-20 22:46:43 -05:00
  • f491d473a3 FLUX: Optimize dataset loading logic (#1038) madroid 2024-10-16 01:37:45 +08:00
  • 3d62b058a4 fix: typo on flux model preloading (#1050) Zak B. Elep 2024-10-16 00:13:01 +08:00
  • bbd2003047 FLUX: update README.md (#1036) madroid 2024-10-15 02:21:41 +08:00
  • 605c4854f1 Prompt caching in mlx_lm.server (#1026) Awni Hannun 2024-10-14 10:57:22 -07:00
  • 8dca1a2f60 Tokenizer updates + tests (#1024) Awni Hannun 2024-10-14 10:48:46 -07:00
  • 6c368f2124 bump mac tests to use py39 (#1047) Awni Hannun 2024-10-14 10:40:36 -07:00
  • c799133998 Make llm async eval less brittle (#1040) Awni Hannun 2024-10-14 10:25:24 -07:00
  • 1e0cda68c6 Update README.md (#1045) Seitaro Sugawara 2024-10-14 22:21:25 +09:00
  • 7612c646f3 Fix PLaMo model to support Grouped Query Attention (#1037) Shunta Saito 2024-10-13 07:26:50 +09:00
  • d8611dd69f Small typo fixed in flux README.md (#1035) Ivan Fioravanti 2024-10-12 15:14:01 +02:00
  • a5f2bab070 Add FLUX finetuning (#1028) Angelos Katharopoulos 2024-10-11 21:17:41 -07:00
  • d72fdeb4ee MusicGen (#1020) Alex Barron 2024-10-11 10:16:20 -07:00
  • 4360e7ccec clear cache during prompt processing (#1027) Awni Hannun 2024-10-09 16:48:32 -07:00
  • b7373cb44f fix long prompt generations (#1023) Awni Hannun 2024-10-09 11:09:36 -07:00
  • fca087be49 More cache improvements (#1015) Awni Hannun 2024-10-07 20:45:51 -07:00
  • 9bc53fc210 convert (#1006) Awni Hannun 2024-10-02 13:13:33 -07:00
  • 36c1d8e8dc Server: support function calling (#1003) madroid 2024-10-03 03:36:07 +08:00
  • 0866e23a67 repetiton_penalty and logits_bias just using logits_processors (#1004) nathan 2024-09-30 17:49:03 +02:00
  • 418d9a5511 Feature: QDoRA (#891) Zai Thottakath 2024-09-30 10:01:11 -05:00
  • aa1c8abdc6 LoRA: Support HuggingFace dataset via data parameter (#996) madroid 2024-09-30 22:36:21 +08:00
  • 50e5ca81a8 Adding full finetuning (#903) Gökdeniz Gülmez 2024-09-30 02:12:47 +02:00
  • 7ec2021bb9 LoRA: support tools(function calling) format datasets (#995) madroid 2024-09-29 01:41:36 +08:00
  • ace2bb5890 Add logits_processor option to generate_step function (#983) nathan 2024-09-28 19:08:49 +02:00
  • d812516d3d Add /v1/models endpoint to mlx_lm.server (#984) jamesm131 2024-09-29 00:21:11 +10:00
  • 76710f61af Adding support for mamba (#940) Gökdeniz Gülmez 2024-09-28 16:02:53 +02:00
  • e776c970f7 Fix llava model when using text-only prompt (#998) Cheng 2024-09-25 23:19:41 +09:00
  • 9bb2dd62f3 Encodec (#991) Awni Hannun 2024-09-23 11:39:25 -07:00
  • 796d5e40e4 Fix export to gguf (#993) Angelos Katharopoulos 2024-09-20 13:33:45 -07:00
  • f530f56df2 don't use internal exception (#990) Awni Hannun 2024-09-17 16:22:48 -07:00
  • 6c2369e4b9 Fix bug in upload + docs nit (#981) Awni Hannun 2024-09-07 14:46:57 -07:00
  • c3e3411756 Update LLM generation docs to use chat template (#973) Awni Hannun 2024-09-07 06:06:15 -07:00
  • 324184d670 Fix the cache_prompt (#979) Angelos Katharopoulos 2024-09-06 20:19:27 -07:00
  • bd29aec299 Support HuggingFace model tree (#957) madroid 2024-09-04 21:19:32 +08:00
  • 83a209e200 Add prompt piping (#962) Chime Ogbuji 2024-09-03 16:29:10 -04:00
  • bf921afcbe Make sure to import the correct "version" module when installing mlx_whisper and mlx_lm from local source code. (#969) James Zhao 2024-09-03 23:16:21 +03:00
  • 3c6e8b11af fix (#965) Awni Hannun 2024-08-30 05:56:27 -07:00
  • fc93c55723 feat(mlx_lm): Nemotron (#949) L 2024-08-29 21:08:57 -07:00
  • b1186e2a81 Docs on prompt scaling (#963) Awni Hannun 2024-08-29 15:05:17 -07:00
  • 1003a8b2dd Add the ability to load the KV cache from a file (#956) Angelos Katharopoulos 2024-08-28 22:11:45 -07:00
  • 7f8c961287 Fix setattr for the TokenizerWrapper (#961) Angelos Katharopoulos 2024-08-28 14:47:33 -07:00
  • bf21789b17 chore: update black pre-commit hooks to latest versions (#955) Nripesh Niketan 2024-08-26 20:24:23 +05:30
  • b5e18ef1e3 Add Phi-3.5-MoE (#946) Prince Canuma 2024-08-24 15:52:33 +02:00
  • 6731254e76 Use fast rope (#945) Awni Hannun 2024-08-23 13:18:51 -07:00
  • 58591a1b41 fine tune deepseek (#932) Awni Hannun 2024-08-22 10:41:21 -07:00
  • 0164d2058b feat: DeepSeek MoE v1 (#942) L 2024-08-17 23:18:09 +09:00
  • 7be292c0c9 Handle longer prompt/generation (#931) Awni Hannun 2024-08-16 15:28:39 -07:00
  • e196fa3208 Whisper: Support command line (#746) madroid 2024-08-17 01:35:44 +08:00
  • 4e01700816 Allow the entire model to be targed for LoRA and DoRA fine tuning: LoRA and DoRA embeddings with small DoRALinear bug fix (#914) Zai Thottakath 2024-08-16 09:38:36 -05:00
  • c50971e860 Min P implementation (#926) Chime Ogbuji 2024-08-15 18:45:02 -04:00