* support disable metal buffer cache, due to large unused memory buffered when llm generated long context tokens
* Run format and add "cache_enabled" feature tests
* Organize and collect metal subroutine templates and elements in `metal/kernels/steel/`
* Update gemm elements for better performance
* Add split-K specialization for gemm
* Add `addmm` primitive, op and bindings for fused matmul and bias addition
* Update tests and benchmarks as needed
* resolved extension build issues and added test to ci
* missing gguflib
* rebased
* force mlx install from fix branch
* linux build issue
* point to git install and comment out ci tests
* Allow arbitrary first dim on qmm_t and qmv
* Allow arbitrary first dim on qmm and qvm
* Specialized aligned vs unaligned case
* Add more checks for valid quantizations
* feat: add logicalAnd and logicalOR
* run pre-commit
* Refactor logical_and and logical_or functions
* Add acknowledgement
* Add logical AND and logical OR operators
* Refactor logical_and and logical_or functions
* Add support for logical operators on bool arrays
* Update mlx/ops.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update mlx/ops.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Add logical AND and OR operators for arrays and scalars
* Refactor vjp and jvp methods in primitives.cpp
* Add overloaded operators for logical AND and OR
* format
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
* Added GLU activation function and gated activation function
* Ran pre-commit
* Ran pre commit
* Removed old sigmoid implementation to match with main
* Removed gated activation from __init__.py
* Removed unused test cases
* Removed unused imports
* format / docstring
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Add note to install instructions for building from source to ensure native arm64 environment and tools.
* Add troubleshooting info.
* remove cmake bits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* inner / outer impl
* python tests
* ops list and ack
* updated descriptions
* use test helper
* removed dtype check and flatten outer to 1-D
* updated docs
* just use the reshape to flatten