* Add linear warmup to schedules for use with existing schedules
* Changed parameters for simplicity of most common case (0 initial value)
* Added ScheduleJoiner and updated documentation
* ScheduleJoiner -> join_schedules (ala optax #)
* black compliance
* Different evaluation of schedules
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Add a few LR schedulers
* Move parents's constructor call to the top
* Fix docstring
* refactor optimizers into two files
* add docs
* nit
* Fix Callable type annotation for python 3.8
---------
Co-authored-by: Awni Hannun <awni@apple.com>
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
* Simple kernel generation
* Remove the generate kernel from graph_utils
* fix multi-output with compile
* fuse with stopgrad
* v1 input, output capture in compile
* cleanup tree update with visitor update
* nit
* remove todo
* state for model, optional explicit init and more pure optimizer steps
* move learning rate to state
* add lr to opt state, some fixes in capture
* fix optim
* update tuple of containers as well
* fix stream for compiled output
* rng state for compile
* nit
* updates and comments
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
* Added adafactor
* Added Adafactor and ran pre-commit
* modified operations
* Added docstrings
* Switched two ops to fix a bug
* added underscore for internal functions and removed the plus sign in the last return statment
* Removed parameter rms from the optimizer state because its not needed
* Added simple MNIST test for Adafactor and temporary training log
* remove test files
* nits in docs
* comment nit
---------
Co-authored-by: Awni Hannun <awni@apple.com>