mlx-examples

zhangyiss/mlx-examples

Fork 0

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-06-25 01:41:19 +08:00

Commit Graph

Author	SHA1	Message	Date
Prince Canuma	3fdf85e79d	Starcoder2: Update config and change GQA to use repeat (#520 ) * update config * change gqa to use repeat instead of concante * contribution	2024-03-03 06:12:03 -08:00
Anchen	1e3daea3bb	chore(mlx-lm): add missing model_type for starcoder2 (#522 )	2024-03-03 06:07:45 -08:00
Muhtasham Oblokulov	81e2a80026	Add Starcoder 2 (#502 ) * Add Starcoder2 model and update utils.py * Refactor model arguments and modules in starcoder2.py * Refactor FeedForward class to MLP in starcoder2.py * Fix typo * pre-commit * Refactor starcoder2.py: Update model arguments and modules * Fix LM head and MLP layers * Rename input layer norm * Update bias in linear layers * Refactor token embeddings in Starcoder2Model * Rename to standard HF attention layer name * Add LayerNorm * Add transposed token embeddings (like in Gemma) * Refactor MLP and TransformerBlock classes * Add tie_word_embeddings option to ModelArgs and update Model implementation * Add conditional check for tying word embeddings in Starcoder2Model * Fix bias in lm_head linear layer * Remove unused LayerNorm in stablelm * Update transformers dependency to use GitHub repository * fix lm head bug, revert transformer req * Update RoPE initialization in Attention class --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-02 19:39:23 -08:00

Author

SHA1

Message

Date

Prince Canuma

3fdf85e79d

Starcoder2: Update config and change GQA to use repeat (#520 )

* update config

* change gqa to use repeat instead of concante

* contribution

2024-03-03 06:12:03 -08:00

Anchen

1e3daea3bb

chore(mlx-lm): add missing model_type for starcoder2 (#522 )

2024-03-03 06:07:45 -08:00

Muhtasham Oblokulov

81e2a80026

Add Starcoder 2 (#502 )

* Add Starcoder2 model and update utils.py

* Refactor model arguments and modules in starcoder2.py

* Refactor FeedForward class to MLP in starcoder2.py

* Fix typo

* pre-commit

* Refactor starcoder2.py: Update model arguments and modules

* Fix LM head and MLP layers

* Rename  input layer norm

* Update bias in linear layers

* Refactor token embeddings in Starcoder2Model

* Rename to standard HF attention layer name

* Add LayerNorm

* Add transposed token embeddings (like in Gemma)

* Refactor MLP and TransformerBlock classes

* Add tie_word_embeddings option to ModelArgs and update Model implementation

* Add conditional check for tying word embeddings in Starcoder2Model

* Fix bias in lm_head linear layer

* Remove unused LayerNorm in stablelm

* Update transformers dependency to use GitHub repository

* fix lm head bug, revert transformer req

* Update RoPE initialization in Attention class

---------

Co-authored-by: Awni Hannun <awni@apple.com>

2024-03-02 19:39:23 -08:00

3 Commits