Gabrijel Boduljak
|
94358219cf
|
CLIP (ViT) (#315)
* probably approximatelly correct CLIPTextEncoder
* implemented CLIPEncoderLayer as built-in nn.TransformerEncoderLayer
* replaced embedding layer with simple matrix
* implemented ViT
* added ViT tests
* fixed tests
* added pooler_output for text
* implemented complete CLIPModel
* implemented init
* implemented convert.py and from_pretrained
* fixed some minor bugs and added the README.md
* removed tokenizer unused comments
* removed unused deps
* updated ACKNOWLEDGEMENTS.md
* Feat: Image Processor for CLIP (#1)
@nkasmanoff:
* clip image processor
* added example usage
* refactored image preprocessing
* deleted unused image_config.py
* removed preprocessing port
* added dependency to mlx-data
* fixed attribution and moved photos to assets
* implemented a simple port of CLIPImageProcessor
* review changes
* PR review changes
* renamed too verbose arg
* updated README.md
* nits in readme / conversion
* simplify some stuff, remove unneeded inits
* remove more init stuff
* more simplify
* make test a unit test
* update main readme
* readme nits
---------
Co-authored-by: Noah Kasmanoff <nkasmanoff@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
|
2024-01-31 14:19:53 -08:00 |
|