* more async eval
* fix rebase
* try correct async eval
* fix async
* more tests for async eval
* use shared events for synchronization
* comment + cleanup
* with autorelease pool
* fix no metal build
* fix compile
* fix patch
* don't eval if asyn evale'd
* don't use is_evaled
* comments
* more multi stream tests
* try and cleanup use of is_evaled
* use a status flag
* Simple kernel generation
* Remove the generate kernel from graph_utils
* fix multi-output with compile
* fuse with stopgrad
* v1 input, output capture in compile
* cleanup tree update with visitor update
* nit
* remove todo
* state for model, optional explicit init and more pure optimizer steps
* move learning rate to state
* add lr to opt state, some fixes in capture
* fix optim
* update tuple of containers as well
* fix stream for compiled output
* rng state for compile
* nit
* updates and comments
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>