Wan2.1

Quickstart

Installation

Install dependencies:

pip install -r requirements.txt

Model Download

Models	Download Link	Notes
T2V-14B	🤗 Huggingface 🤖 ModelScope	Supports both 480P and 720P
I2V-14B-720P	🤗 Huggingface 🤖 ModelScope	Supports 720P
I2V-14B-480P	🤗 Huggingface 🤖 ModelScope	Supports 480P
T2V-1.3B	🤗 Huggingface 🤖 ModelScope	Supports 480P

💡Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution. Also, note that the MLX port currently only supports T2V.

Download models using huggingface-cli:

pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir ./Wan2.1-T2V-14B

Download models using modelscope-cli:

pip install modelscope
modelscope download Wan-AI/Wan2.1-T2V-14B --local_dir ./Wan2.1-T2V-14B

Run Text-to-Video Generation

This repository currently supports two Text-to-Video models (1.3B and 14B) and two resolutions (480P and 720P). The parameters and configurations for these models are as follows:

Task	Resolution		Model
Task	480P	720P	Model
t2v-14B	✔️	✔️	Wan2.1-T2V-14B
t2v-1.3B	✔️	❌	Wan2.1-T2V-1.3B

(1) Example:

python generate.py --task t2v-1.3B --size "480*832" --frame_num 16 --sample_steps 25 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --prompt "Lion running under snow in Samarkand" --save_file output_video_mlx.mp4

Citation

Credits to the Wan Team for the original PyTorch implementation.

@article{wan2.1,
    title   = {Wan: Open and Advanced Large-Scale Video Generative Models},
    author  = {Wan Team},
    journal = {},
    year    = {2025}
}

3.1 KiB Raw Blame History