mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-12-16 02:08:55 +08:00
CLIP (ViT) (#315)
* probably approximatelly correct CLIPTextEncoder * implemented CLIPEncoderLayer as built-in nn.TransformerEncoderLayer * replaced embedding layer with simple matrix * implemented ViT * added ViT tests * fixed tests * added pooler_output for text * implemented complete CLIPModel * implemented init * implemented convert.py and from_pretrained * fixed some minor bugs and added the README.md * removed tokenizer unused comments * removed unused deps * updated ACKNOWLEDGEMENTS.md * Feat: Image Processor for CLIP (#1) @nkasmanoff: * clip image processor * added example usage * refactored image preprocessing * deleted unused image_config.py * removed preprocessing port * added dependency to mlx-data * fixed attribution and moved photos to assets * implemented a simple port of CLIPImageProcessor * review changes * PR review changes * renamed too verbose arg * updated README.md * nits in readme / conversion * simplify some stuff, remove unneeded inits * remove more init stuff * more simplify * make test a unit test * update main readme * readme nits --------- Co-authored-by: Noah Kasmanoff <nkasmanoff@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
committed by
GitHub
parent
ba3a9355d1
commit
94358219cf
@@ -26,9 +26,15 @@ Some more useful examples are listed below.
|
||||
|
||||
- Speech recognition with [OpenAI's Whisper](whisper).
|
||||
|
||||
### Multimodal models
|
||||
|
||||
- Joint text and image embeddings with [CLIP](clip).
|
||||
|
||||
### Other Models
|
||||
|
||||
- Semi-supervised learning on graph-structured data with [GCN](gcn).
|
||||
- Real NVP [normalizing flow](normalizing_flow) for density estimation and
|
||||
sampling.
|
||||
|
||||
### Hugging Face
|
||||
|
||||
|
||||
Reference in New Issue
Block a user