mlx-examples/llms/tests
tidely df744c98e6
Predict stop sequence matches during streaming (#541)
* Predict stop sequence matches during streaming

Check for overlap of stop sequences and the tokens array for potential sequence matches after more tokens get generated. Generate tokens until we can confirm that the stop sequence is not met.

* fix typo

* Change sequence_overlap logic

* range isn't inclusive, add 1 to max_overlap

* Add test_server.py

Added a test for the sequence_overlap method

* nits

* eos sequence

* finalize

---------

Co-authored-by: Y4hL <43219534+Y4hL@users.noreply.github.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-06 15:24:15 -07:00
..
test_datsets.py Configuration-based use of HF hub-hosted datasets for training (#701) 2024-06-26 10:20:50 -07:00
test_gguf.py fix(mlx-lm): type hints in gguf.py (#621) 2024-03-26 07:56:01 -07:00
test_lora.py Add GPT-neox model (#863) 2024-07-11 06:13:17 -07:00
test_models.py Add support for Llama-3.1 (#907) 2024-07-23 13:21:32 -07:00
test_sample_utils.py fix(mlx-lm): type hints in gguf.py (#621) 2024-03-26 07:56:01 -07:00
test_server.py Predict stop sequence matches during streaming (#541) 2024-08-06 15:24:15 -07:00
test_tuner_utils.py LoRA: Extract small function (#614) 2024-06-02 06:38:42 -07:00
test_utils_load_model.py support load model by custom get_model_classes (#899) 2024-07-25 11:01:17 -07:00
test_utils.py LoRA on all linear transformer block layers (#546) 2024-03-12 07:37:40 -07:00