* Add 'models' endpoint to server
* Add test for new 'models' server endpoint
* Check hf_cache for mlx models
* update tests to check hf_cache for models
* simplify test
* doc
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Added functionality to load in adapters through post-requests so you do not need to restart the server
* ran pre-commit
* nits
* fix test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Initial implementation
* Fix handling of return_step_logits in return
* Fixed OpenAI parameter expectations and logprob structure and datatypes
* pre-commit black formatting
* Remove unused parameter
* fix log probs
* fix colorize
* nits in server
* nits in server
* Fix top_logprobs structure (a dict) and include tokens in logprobs response
* nits
* fix types
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Don't serve local directory
BaseHTTPRequestHandler serves the current directory by default. Definitely not intended behaviour. Remove the "do_HEAD" and "do_GET" methods.
* Fix typo in method name
I assume hanlde_stream was intended to be called handle_stream
* Fix outdated typehint
load_model returns nn.Module, however fetch_from_hub was not updated to reflect the change
* Add some more type hints
* Add warnings for using in prod
Add a warning to README and runtime, discouraging use in production. The warning is the same as on the python docs for HTTPServer https://docs.python.org/3/library/http.server.html
* format
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>