FLUX: Optimize dataset loading logic (#1038)

2025-12-16 02:08:55 +08:00 · 2024-10-16 01:37:45 +08:00
parent 3d62b058a4
commit f491d473a3
6 changed files with 461 additions and 365 deletions
--- a/flux/README.md
+++ b/flux/README.md
@@ -21,8 +21,9 @@ The dependencies are minimal, namely:

 - `huggingface-hub` to download the checkpoints.
 - `regex` for the tokenization
- `tqdm`, `PIL`, and `numpy` for the `txt2image.py` script
+- `tqdm`, `PIL`, and `numpy` for the scripts
 - `sentencepiece` for the T5 tokenizer
+- `datasets` for using an HF dataset directly

 You can install all of the above with the `requirements.txt` as follows:

@@ -118,17 +119,12 @@ Finetuning

 The `dreambooth.py` script supports LoRA finetuning of FLUX-dev (and schnell
 but ymmv) on a provided image dataset. The dataset folder must have an
-`index.json` file with the following format:
+`train.jsonl` file with the following format:

-```json
-{
-    "data": [
-        {"image": "path-to-image-relative-to-dataset", "text": "Prompt to use with this image"},
-        {"image": "path-to-image-relative-to-dataset", "text": "Prompt to use with this image"},
-        {"image": "path-to-image-relative-to-dataset", "text": "Prompt to use with this image"},
-        ...
-    ]
-}
+```jsonl
+{"image": "path-to-image-relative-to-dataset", "prompt": "Prompt to use with this image"}
+{"image": "path-to-image-relative-to-dataset", "prompt": "Prompt to use with this image"}
+...
 ```

 The training script by default trains for 600 iterations with a batch size of
@@ -150,19 +146,15 @@ The training images are the following 5 images [^2]:

 ![dog6](static/dog6.png)

-We start by making the following `index.json` file and placing it in the same
+We start by making the following `train.jsonl` file and placing it in the same
 folder as the images.

-```json
-{
-    "data": [
-        {"image": "00.jpg", "text": "A photo of sks dog"},
-        {"image": "01.jpg", "text": "A photo of sks dog"},
-        {"image": "02.jpg", "text": "A photo of sks dog"},
-        {"image": "03.jpg", "text": "A photo of sks dog"},
-        {"image": "04.jpg", "text": "A photo of sks dog"}
-    ]
-}
+```jsonl
+{"image": "00.jpg", "prompt": "A photo of sks dog"}
+{"image": "01.jpg", "prompt": "A photo of sks dog"}
+{"image": "02.jpg", "prompt": "A photo of sks dog"}
+{"image": "03.jpg", "prompt": "A photo of sks dog"}
+{"image": "04.jpg", "prompt": "A photo of sks dog"}
 ```

 Subsequently we finetune FLUX using the following command:
@@ -175,6 +167,17 @@ python dreambooth.py \
    path/to/dreambooth/dataset/dog6
 ```

+
+Or you can directly use the pre-processed Hugging Face dataset [mlx-community/dreambooth-dog6](https://huggingface.co/datasets/mlx-community/dreambooth-dog6) for fine-tuning.
+
+```shell
+python dreambooth.py \
+    --progress-prompt 'A photo of an sks dog lying on the sand at a beach in Greece' \
+    --progress-every 600 --iterations 1200 --learning-rate 0.0001 \
+    --lora-rank 4 --grad-accumulate 8 \
+    mlx-community/dreambooth-dog6
+```
+
 The training requires approximately 50GB of RAM and on an M2 Ultra it takes a
 bit more than 1 hour.