mlx.optimizers.SGD#

class mlx.optimizers.SGD(learning_rate: float, momentum: float = 0.0, weight_decay: float = 0.0, dampening: float = 0.0, nesterov: bool = False)#

Stochastic gradient descent optimizer.

Updates a parameter \(w\) with a gradient \(g\) as follows

\[\begin{split}v_{t+1} &= \mu v_t + g_t \\ w_{t+1} &= w_t - \lambda v_{t+1}\end{split}\]
Parameters:
  • learning_rate (float) – The learning \(\lambda\) for the update

  • momentum (float, optional) – The momentum strength \(\mu\) (default: 0)

  • weight_decay (float, optional) – The weight decay (L2 penalty) (default: 0)

  • dampening (float, optional) – Dampening for momentum \(\tau\) (default: 0)

  • nesterov (bool, optional) – Enables Nesterov momentum (default: False)

Methods

__init__(learning_rate[, momentum, ...])

apply_single(gradient, parameter, state)

Performs the SGD parameter update and stores \(v\) in the optimizer state.