Commit Graph

6 Commits

Author SHA1 Message Date
Awni Hannun
0f240a4c7e
Use max tokens from options in mlx_lm evaluate (#1302) 2025-02-26 15:46:16 -08:00
Awni Hannun
e879ea70e1
fix generation evaluations (#1277) 2025-02-11 16:10:30 -08:00
Chime Ogbuji
f2619f507c
Add support for fewshot and apply chat template lm_eval functionality (#1180)
* Add support for multiturn fewshot examples and chat templates

Added two new arguments to the evaluation script: `--fewshot-as-multiturn` and `--apply-chat-template` which correspond to lm_eval options of similar names and are very often used to ensure apples-to-apples comparisons of lm_evaluation results

* Add HF overrides for methods needed by added options

* don't add duplicate bos

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-06 07:58:43 -08:00
Awni Hannun
c4833a2f55
fix encoding with special tokens + chat template (#1189) 2025-01-03 10:50:59 -08:00
Ikko Eltociear Ashimine
fc0674d2d8
chore: update evaluate.py (#1159)
occurence -> occurrence
2024-12-15 06:06:29 -08:00
Alex Barron
cd8cf28c39
mlx_lm.evaluate (#1140)
* Add evaluation script

* only write top level results

* add lm eval version

* typo

* create output dir

* relative import

* comment

---------

Co-authored-by: David Grangier <dgrangier@users.noreply.github.com>
2024-12-08 12:20:10 -08:00