mlx-examples/cifar/README.md

# CIFAR and ResNets

An example of training a ResNet on CIFAR-10 with MLX. Several ResNet
configurations in accordance with the original
[paper](https://arxiv.org/abs/1512.03385) are available. The example also
illustrates how to use [MLX Data](https://github.com/ml-explore/mlx-data) to
load the dataset.

## Pre-requisites

Install the dependencies:

```
pip install -r requirements.txt
```

## Running the example

Run the example with:

```
python main.py
```

By default the example runs on the GPU. To run on the CPU, use: 

```
python main.py --cpu
```

For all available options, run:

```
python main.py --help
```

## Results

After training with the default `resnet20` architecture for 30 epochs, you
should see the following results:

```
Epoch: 29 | avg. Train loss 0.294 | avg. Train acc 0.897 | Throughput: 270.81 images/sec
Epoch: 29 | Test acc 0.841
```

Note this was run on an M1 Macbook Pro with 16GB RAM.

At the time of writing, `mlx` doesn't have built-in learning rate schedules.
We intend to update this example once these features are added.

## Distributed training

The example also supports distributed data parallel training. You can launch a
distributed training as follows:

```shell
$ cat >hostfile.json
[
    {"ssh": "host-to-ssh-to", "ips": ["ip-to-bind-to"]},
    {"ssh": "host-to-ssh-to", "ips": ["ip-to-bind-to"]}
]
$ mlx.launch --verbose --hostfile hostfile.json main.py --batch 256 --epochs 5 --arch resnet20
```
added CIFAR10 + ResNet example 2023-12-13 02:01:06 +08:00			`# CIFAR and ResNets`

updates + format 2023-12-15 04:09:10 +08:00			`An example of training a ResNet on CIFAR-10 with MLX. Several ResNet`
			`configurations in accordance with the original`
			`[paper](https://arxiv.org/abs/1512.03385) are available. The example also`
			`illustrates how to use [MLX Data](https://github.com/ml-explore/mlx-data) to`
			`load the dataset.`
added CIFAR10 + ResNet example 2023-12-13 02:01:06 +08:00
			`## Pre-requisites`
updates + format 2023-12-15 04:09:10 +08:00
simplified ResNet, expanded README with throughput and performance 2023-12-14 16:05:04 +08:00			`Install the dependencies:`
added CIFAR10 + ResNet example 2023-12-13 02:01:06 +08:00
			```
			`pip install -r requirements.txt`
			```

			`## Running the example`
updates + format 2023-12-15 04:09:10 +08:00
added CIFAR10 + ResNet example 2023-12-13 02:01:06 +08:00			`Run the example with:`

			```
			`python main.py`
			```

			`By default the example runs on the GPU. To run on the CPU, use:`

			```
simplified ResNet, expanded README with throughput and performance 2023-12-14 16:05:04 +08:00			`python main.py --cpu`
added CIFAR10 + ResNet example 2023-12-13 02:01:06 +08:00			```

			`For all available options, run:`

			```
			`python main.py --help`
			```
simplified ResNet, expanded README with throughput and performance 2023-12-14 16:05:04 +08:00
updates + format 2023-12-15 04:09:10 +08:00			`## Results`
simplified ResNet, expanded README with throughput and performance 2023-12-14 16:05:04 +08:00
Updated CIFAR-10 ResNet example to use BatchNorm instead of LayerNorm (#257) * replaced nn.LayerNorm by nn.BatchNorm * mlx>=0.0.8 required * updated default to 30 epochs instead of 100 * updated README after adding BatchNorm * requires mlx>=0.0.9 * updated README.md with results for mlx-0.0.9 2024-01-12 21:43:11 +08:00			After training with the default `resnet20` architecture for 30 epochs, you
updates + format 2023-12-15 04:09:10 +08:00			`should see the following results:`
simplified ResNet, expanded README with throughput and performance 2023-12-14 16:05:04 +08:00
			```
Updated CIFAR-10 ResNet example to use BatchNorm instead of LayerNorm (#257) * replaced nn.LayerNorm by nn.BatchNorm * mlx>=0.0.8 required * updated default to 30 epochs instead of 100 * updated README after adding BatchNorm * requires mlx>=0.0.9 * updated README.md with results for mlx-0.0.9 2024-01-12 21:43:11 +08:00			`Epoch: 29 \| avg. Train loss 0.294 \| avg. Train acc 0.897 \| Throughput: 270.81 images/sec`
			`Epoch: 29 \| Test acc 0.841`
simplified ResNet, expanded README with throughput and performance 2023-12-14 16:05:04 +08:00			```

updates + format 2023-12-15 04:09:10 +08:00			`Note this was run on an M1 Macbook Pro with 16GB RAM.`
simplified ResNet, expanded README with throughput and performance 2023-12-14 16:05:04 +08:00
Updated CIFAR-10 ResNet example to use BatchNorm instead of LayerNorm (#257) * replaced nn.LayerNorm by nn.BatchNorm * mlx>=0.0.8 required * updated default to 30 epochs instead of 100 * updated README after adding BatchNorm * requires mlx>=0.0.9 * updated README.md with results for mlx-0.0.9 2024-01-12 21:43:11 +08:00			At the time of writing, `mlx` doesn't have built-in learning rate schedules.
			`We intend to update this example once these features are added.`
Distributed support cifar (#1301) 2025-03-06 05:33:15 +08:00
			`## Distributed training`

			`The example also supports distributed data parallel training. You can launch a`
			`distributed training as follows:`

			```shell
			`$ cat >hostfile.json`
			`[`
			`{"ssh": "host-to-ssh-to", "ips": ["ip-to-bind-to"]},`
			`{"ssh": "host-to-ssh-to", "ips": ["ip-to-bind-to"]}`
			`]`
			`$ mlx.launch --verbose --hostfile hostfile.json main.py --batch 256 --epochs 5 --arch resnet20`
			```