Commit Graph

11 Commits

Author SHA1 Message Date
L Lllvvuu
3082db0143
WIP: most of the merge, minus llms/mlx_lm/utils.py
TODO: Re-implement `batch_generate`
TODO: Update all `generate_step` callsites

NOTE: `generate_step` taking `(bs, seq_len)` instead of `(seq_len,)` is
a breaking change. In particular, `sampler` and `logits_processors` will
need to handle logits of shape `(bs, vocab_size)` instead of `(vocab_size,)`.
2024-12-27 01:17:20 -08:00
Awni Hannun
db109184b7
Fix no template prompt + top_k sampling (#1166)
* fix no template prompt

* add top_k sampling

* fix chinese
2024-12-18 18:46:50 -08:00
Awni Hannun
2ba0e36683
[mlx-lm] Use top p in server (#1144)
* use top p in server

* couple other fixes
2024-12-12 11:12:21 -08:00
Awni Hannun
0f135396ae
Generation refactor: part 2 (#1099)
* unify with stream_generate

* fixes

* nit

* some cleanup, warnings, tests

* fix test + faster min p + test

* version
2024-11-23 11:47:06 -08:00
Awni Hannun
657b4cc0aa
[MLX LM] Sampler refactor + a few improvements (#1094)
* starting

* refactor sampler/processor and a few improvements

* fix stream

* fix stream generate

* fix eos handling in stream generate
2024-11-07 16:15:24 -08:00
L Lllvvuu
280b3784d4
feat: support batch input in generate()
The `prompt` argument can now be either a `str` or `list[str]`.

The change to `generate()` is backwards-compatible.

The changes to `generate_step()`, `top_p_sampling()`, and
`min_p_sampling()` are backwards-incompatible in order to unify shapes;
this could be changed by adding a few if-statements, if preferred.
2024-08-29 05:13:14 -07:00
Chime Ogbuji
c50971e860
Min P implementation (#926)
* Min P implementation

* Change default to 0 (no min_p)

* nits

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-15 15:45:02 -07:00
Awni Hannun
9b83004631
Faster sampling with mx.compile (#937)
* faster sampling with compile

* fix test
2024-08-15 11:29:09 -07:00
Awni Hannun
9c5554d8ee
Use async eval (#670)
* Use async eval

* bump

* bump

* remove workaround for bfloat cumsum
2024-04-11 13:18:23 -07:00
Anchen
0ab01b4626
fix(mlx-lm): sorted probs in top_p implementation. (#610)
* fix(mlx-lm): the top p imp

* chore: address comment
2024-03-25 15:07:55 -07:00
Anchen
fbed720d6f
chore(mlx-lm): fix the top_p implementation. (#602)
* chore(mlx-lm): clean up the top p imp

* chore: clean up

* chore: add test

* chore: address comments

* chore: clean up docs string

* chore: clean up test
2024-03-21 12:18:23 -07:00