Add support for fewshot and apply chat template lm_eval functionality (#1180)

* Add support for multiturn fewshot examples and chat templates Added two new arguments to the evaluation script: `--fewshot-as-multiturn` and `--apply-chat-template` which correspond to lm_eval options of similar names and are very often used to ensure apples-to-apples comparisons of lm_evaluation results * Add HF overrides for methods needed by added options * don't add duplicate bos --------- Co-authored-by: Awni Hannun <awni@apple.com>
2025-12-15 17:58:54 +08:00 · 2025-01-06 10:58:43 -05:00
parent 25ec2d8c44
commit f2619f507c
3 changed files with 45 additions and 20 deletions
--- a/llms/setup.py
+++ b/llms/setup.py
@@ -27,8 +27,8 @@ setup(
    packages=["mlx_lm", "mlx_lm.models", "mlx_lm.tuner"],
    python_requires=">=3.8",
    extras_require={
-        "testing": ["datasets"],
-        "evaluation": ["lm-eval"],
+        "test": ["datasets"],
+        "evaluate": ["lm-eval", "tqdm"],
    },
    entry_points={
        "console_scripts": [