mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-09-01 12:49:50 +08:00
Make attention faster for a some models (#574)
* make attention faster for a couple models * remove unused generation flags * add comment on lora * include text files as well
This commit is contained in:
@@ -167,6 +167,12 @@ of memory. Here are some tips to reduce memory use should you need to do so:
|
||||
you can do is break your examples into smaller
|
||||
sequences when making the `{train, valid, test}.jsonl` files.
|
||||
|
||||
5. Gradient checkpointing lets you trade-off memory use (less) for computation
|
||||
(more) by recomputing instead of storing intermediate values needed by the
|
||||
backward pass. You can use gradient checkpointing by passing the
|
||||
`--grad-checkpoint` flag. Gradient checkpointing will be more helpful for
|
||||
larger batch sizes or sequence lengths with smaller or quantized models.
|
||||
|
||||
For example, for a machine with 32 GB the following should run reasonably fast:
|
||||
|
||||
```
|
||||
|
Reference in New Issue
Block a user