![]() * Use named tuple from typing for typehints * Add type hints * Simplify expression * Type hint fix * Improved do_POST logic Use a map of endpoints to methods to reduce redundancy in code * Fix format * Improve redundancy Call method dynamically instead of writing out all arguments twice * Send response instead of returning * Fix typo * Revert change * Make adapter_file as Optional * Mark formatter as optional * format * Create message generator Store response data that stays static for the duration of the response inside of the object: system_fingerprint request_id object_type requested_model Created a message generator, that dynamically creates messages from the metadata stored inside of the object, and the data from the model pipeline * Remove leftover * Update parameters to reflect new object structure No longer pass all arguments between functions, but use the stores values inside of the object * Parse body before calling request specific methods * Call super init * Update server.py * Fixed outdated documentation parameter name * Add documentation * Fix sending headers twice During testing I found that when using the streaming option, headers have always been sent twice. This should fix that * Simplify streaming code by using guard clauses Don't wrap wfile writes in try blocks, the server class has its own try block to prevent crashing * Bug fix * Use Content-Length header Let the completion type specific methods finish sending the headers. This allows us to send the Content-Length header as the model returns a completion. * Update utils.py * Add top_p documentation * Type hint model and tokenizer as required * Use static system fingerprint System fingerprint now stays the same across requests * Make type hint more specific * Bug Fix Supplying less than 2 models to merge would raise ValueError and calls len on unbound "models". Should be "model_paths" instead. Mark upload_repo as optional * Move more of the shared code into do_POST Processing stop_id_sequences is done no matter the request endpoint or type, move it into the shared section. handle_ methods now just return the prompt in mx.array form. * Store stop_id_sequences as lists instead of np During testing I found that letting the tokenizer return values as python lists and converting them to mlx arrays was around 20% faster than having the tokenizer convert them to np, and from np to mlx. This allows makes it so numpy no longer needs to be imported. * Update stop_id_sequences docs * Turn if check to non-inclusive Only continue if buffer is smaller * Documentation fix * Cleared method names Instead of handle_stream and generate_competion, we should name it handle_completion. Instead of handle_completions and handle_chat_completions, we should name it handle_text_completions, since both are completions, calling it text completions should make it more descriptive * Make comment clearer * fix format * format |
||
---|---|---|
.circleci | ||
bert | ||
cifar | ||
clip | ||
cvae | ||
gcn | ||
llava | ||
llms | ||
lora | ||
mnist | ||
normalizing_flow | ||
speechcommands | ||
stable_diffusion | ||
t5 | ||
transformer_lm | ||
whisper | ||
.gitignore | ||
.pre-commit-config.yaml | ||
ACKNOWLEDGMENTS.md | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
LICENSE | ||
README.md |
MLX Examples
This repo contains a variety of standalone examples using the MLX framework.
The MNIST example is a good starting point to learn how to use MLX.
Some more useful examples are listed below.
Text Models
- Transformer language model training.
- Large scale text generation with LLaMA, Mistral, Phi-2, and more in the LLMs directory.
- A mixture-of-experts (MoE) language model with Mixtral 8x7B.
- Parameter efficient fine-tuning with LoRA or QLoRA.
- Text-to-text multi-task Transformers with T5.
- Bidirectional language understanding with BERT.
Image Models
- Image classification using ResNets on CIFAR-10.
- Generating images with Stable Diffusion.
- Convolutional variational autoencoder (CVAE) on MNIST.
Audio Models
- Speech recognition with OpenAI's Whisper.
Multimodal models
Other Models
- Semi-supervised learning on graph-structured data with GCN.
- Real NVP normalizing flow for density estimation and sampling.
Hugging Face
Note: You can now directly download a few converted checkpoints from the MLX Community organization on Hugging Face. We encourage you to join the community and contribute new models.
Contributing
We are grateful for all of our contributors. If you contribute to MLX Examples and wish to be acknowledged, please add your name to the list in your pull request.
Citing MLX Examples
The MLX software suite was initially developed with equal contribution by Awni Hannun, Jagrit Digani, Angelos Katharopoulos, and Ronan Collobert. If you find MLX Examples useful in your research and wish to cite it, please use the following BibTex entry:
@software{mlx2023,
author = {Awni Hannun and Jagrit Digani and Angelos Katharopoulos and Ronan Collobert},
title = {{MLX}: Efficient and flexible machine learning on Apple silicon},
url = {https://github.com/ml-explore},
version = {0.0},
year = {2023},
}