mlx.optimizers.Adam

class mlx.optimizers.Adam(learning_rate: float, betas: List[float] = [0.9, 0.999], eps: float = 1e-08)

Implementation of the Adam optimizer [1].

Our Adam implementation follows the original paper and omits the bias correction in the first and second moment estimates. In detail,

\[\begin{split}m_{t+1} &= \beta_1 m_t + (1 - \beta_1) g_t \\ v_{t+1} &= \beta_2 v_t + (1 - \beta_2) g_t^2 \\ w_{t+1} &= w_t - \lambda \frac{m_{t+1}}{\sqrt{v_{t+1} + \epsilon}}\end{split}\]

[1]: Kingma, D.P. and Ba, J., 2015. Adam: A method for stochastic optimization. ICLR 2015.

Methods

__init__(learning_rate[, betas, eps])

apply_single(gradient, parameter, state)

Performs the Adam parameter update and stores \(v\) and \(m\) in the optimizer state.