Awni Hannun
|
09aaeac72c
|
fix moe conversion (#802)
|
2024-05-31 12:36:05 -07:00 |
|
Angelos Katharopoulos
|
9f671228cd
|
Block sparse MM MoEs (#782)
- Adds SwitchLinear
- Adds QuantizedSwitchLinear
|
2024-05-21 15:58:08 -07:00 |
|
Awni Hannun
|
ee60e2a9d5
|
Kv cache (#643)
* in place kv_cache
* fix
* fix kv cache size
* partially fix kv cache dtype
* step kv cache
* multiple of step size
* more teests + kv cache
* more kv cache
* udpate all models to use kv cache
|
2024-05-08 08:18:13 -07:00 |
|
Awni Hannun
|
92430df0a0
|
Fix lora for qwen moe (#743)
* fix lora for qwen moe
* use max seq length in test as well
|
2024-05-02 21:55:09 -07:00 |
|
Prince Canuma
|
d661440dbb
|
Add support for qwen2moe (#640)
* add sparsemoe block and update decoder logic
* update file name to match HF
* update name
* Code formatting
* update gates calculation
* add support for Qwen2MoE.
* fix pytest
* code formatting and fix missing comma in utils
* Remove decoder sparse step.
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
* remove gate layer anti-quantisation
* remove unused argument
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
|
2024-04-02 11:33:29 -07:00 |
|