* compile works for big arrays when contiguous
* style
* nits in docs
* a bunch more stuff
* update jit
* update jit
* use constant for shapes and strides and remove elem_to_loc overload
* use kernel instantiation
* docs nits
* update binary and ternary
* comments
* Add `zero_outputs` and `atomic_outputs` options to `metal_kernel`
* add grid sample to docs
* zero_outputs -> init_value
* add missing header for linux
* start
* simple kernels working
* restructure
* inverse example working
* docs + fixes
* missing file
* fix imports
* address comments
* add docs + fix test
* Review comments + refactor to a single function
* update docs
* remove hashing
* fix contig bug in test
* back to a class
* trailing whitespace
* fix tests
* match c++ and python apis
* add link + make args kw_only
* Add fast affine dequantize
* add full quantize kernel
* fused kernel with scale/bias computation
* fix docstring
* fix no jit error
* fix test
* test fix
* reduce fast api to only affine_quantize
* initial attempt, working with wrong types
* not compiling; mx.float16 and mx.bfloat16 tests added
* fix nan to num
* nit
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* einsum initial
* fix comma break
* sum axis was wrong
* small cleanups
* python binding
* changed bindings to resemble numpy
* remove todo comment
* comment changes
* add count of operands/inputs
* fail fast if operands list is empty
* ignore comma if no output
* einsum path matching numpy
* getting somewhere with path
* remove print
* it passes the first test
* moved einsum tests to seperate file
* seperated einsum path
* moved einsum naive
* remove space from equation
* fast fail if no operands passed
* update tests and remove printf
* small cleanup
* some more cleanups
* removed python helper file
* ack
* utilize std for finding min in vector
* duplicate def
* remove the tuple as it was unreadable
* moved einsum_naive back to ops
* remaining isn't needed
* avoid creating another set
* cleanup
* greedy path, start of naive einsum
* more einsum
* fix some bugs
* some more fixes, tests pass
* benchmark
* some simplify
* fix einsum and test
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
* add a bunch more tests and fix a bunch more bugs
* some docs nits
---------
Co-authored-by: dc-dc-dc <dgcruz983@gmail.com>
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
* Working hadamard for powers of 2
* working for m*2^k
* add scale and check contiguity
* add size check
* clean up
* fix test
* add grads + vmap
* gpu only
* skip on linux
* test typo
* add cpu impl
* remove gpu only tests
* fix linux build + add is_equivalent
* Metal shaders for efficient self attention on large sequences
Updated fast attention: GEMM-ified with Steel primitives
Uses flash attention 1 for scale correction
* more compiler silencing
* Address rebase issues
* Templatize kernel instantiation, revise cpu bindings
* Safer writes to output
* Permit batch size > 1
* Numerical fixes for sdpa self attention
* Re-enable test, remove unused variable
* add benchmarking script
* Disable sdpa prior to perf tuning, and simplify tests for per-patch CI
* Not sure if this is correct
* Format
* Edit tests
* Add negative test
* Format
* add one more test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* add synchronize function
* fix linux
* fix linux
* fix and fix docs
* fix test
* try synchronize in stream destroy
* synchronize works for both cpu and gpu
* more async eval
* fix rebase
* try correct async eval
* fix async
* more tests for async eval
* use shared events for synchronization
* comment + cleanup
* with autorelease pool
* fix no metal build
* fix compile
* fix patch
* don't eval if asyn evale'd
* don't use is_evaled
* comments
* more multi stream tests
* try and cleanup use of is_evaled
* use a status flag
* std and expm1
* actually add expm1
* fix linux
* fix vjp
* relax tol for linux test
* Add it to the compilable primitives
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
* add implicit conversion of list to array for equality constraint
* add tests for array equality
* add test for tuple and array equality
* return False if __eq__ arg is list or tuple
* write tests for equality
* update the rule of comparison for __ge__/__gt__/__lt__/__le__
* add a helper function for detecting mlx.core.array
* return true in case fo inequality
* debug minor issue regarding detecting mlx array
* add tests for inequality comparisons
* add name for contribution
* reformat files using pre-commit
* update tests for float
* update tests for inequality
* raise exception in case of invalid comparisons
* use isinstance instead of string comparison
* replace "is_convirtable_to_array" with previous logic
* remove throwing exceptions for other operations
* just a comment
* minor changes for efficiency
* optimize a utils function
* change the function name
* Update ACKNOWLEDGMENTS.md
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* add numeric type hierarchy and issubdtype as well as a set_dtype method to nn.Module with predicate
numeric type hierarchy and issubtype is compatible to the [numpy hierarchy](220f0ab2c5/numpy/_core/numerictypes.py (L42)).
Closes#285.
* nits in docs
* unify type category checking
* nits in docs
* nits in docs
* more docs nits
* fix callable type
---------
Co-authored-by: Awni Hannun <awni@apple.com>