* Add pfnet/plamo-2-1b
* Fix cache.py to support non-top level layers
* Use mlx's BaseModelArgs
* Fix model
* Use sanitize()
* Remove unnecessary changes
* Add plamo2.py
* Apply formatter
* Fix some part
* Allow a cache obj defined externally
* Fix channel first weights to channel last for right use of MLX's conv1d
* Remove unused code part
* Give all inputs when it's the first time call of model
* Fix import
* Include .jsonl files to download from Huggingface hub
* Fix reference to layers
* Remove unnecessary code and add a test for plamo2
* Do not pass mask to prepare_inputs_for_generation
* Fix to use repeat instead of tile
* Add state property to PlamoCache
* Add __iter__ and __next__ methods to PlamoCache
* cleanup
* cleanup
* fix
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* deepseekv3
* use upload_large_file instead of deprecated multi comit
* add pipeline generation and example
* comment
* get fp16 working
* use mlx==0.22
* Support for multiple EOS tokens
* Change _eos_token_ids type from list to set
* Remove model_config & add eos_token_id
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
Improve documentation clarity by:
1. Fix return type annotation to correctly reflect GenerationResponse
2. Simplify docstring by referencing GenerationResponse class
3. Remove redundant field descriptions
* fix rotating kv cache for chat use case
* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat
* nit in chat
* fix tests
* fix tests
* fix tests
* docs
* chat command
* comments + docs
* Define meta_state on all Cache implementations
* fixes + trim_prompt_cache api
* fix default model
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
* Adding full model weights finetuning
* Updating the LORA.md and ACKNOWLEDGMENTS.md files.
* removing --use-dora and --fulll-training and adding --fine-tune-type
* some clean up
* reformating and fixing dora training
* updated CONFIG_DEFAULTS
* update config example
* update in the config example fie
* Update LORA.md
* merge and commit
* adding argument for dora linear layer
* clean up
* clean up in the example yaml file
* fix
* final fix before sending
* small addition to re md file
* fix for loading the fully trained model by saving all the files and configs correctly
* clean up
* removing the unnesesairy files
* changing lora layers back to 16
* removed max file size
* nits
* resolve merge
* some consistency changes
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Add logits_processor option for the generation as in huggingface transformers library
* concatenation correction
* Rename the tokens variable for clarity
* remove the logit_bias argument from generate_step method
* fix the variable name
* nits + test
* test
* add back logit bias + test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Initial implementation
* Fix handling of return_step_logits in return
* Fixed OpenAI parameter expectations and logprob structure and datatypes
* pre-commit black formatting
* Remove unused parameter
* fix log probs
* fix colorize
* nits in server
* nits in server
* Fix top_logprobs structure (a dict) and include tokens in logprobs response
* nits
* fix types
---------
Co-authored-by: Awni Hannun <awni@apple.com>