Haoyu Wu
(† Corresponding Author)
We present MultiWorld, a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency.
In the Multi-Agent Condition Module (Sec. 3.2), Agent Identity Embedding and Adaptive Action Weighting are employed to achieve multi-agent controllability. In the Global State Encoder (Sec. 3.3), we use a frozen VGGT backbone to extract implicit 3D global environmental information from partial observations, thereby improving multi-view consistency. MultiWorld scales effectively across varying agent counts and camera views, supporting autoregressive inference to generate beyond the training context length (Sec. 3.4).
- [2026/4/21] Paper,code,data and project page are available. Welcome to have a try.
conda create -n multiworld python=3.13
conda activate multiworld
# install torch
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
--index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txtMultiWorld release contains two parts: It Takes Two game videos and Robotics videos.
All .tar archives are stored flat in the same dataset repository.
modelscope login <YOUR_API_KEY>
modelscope download --dataset HaoyuWuRUC/MultiWorldData \
--local_dir ./data
bash preprocess/untar_chunks.shhf auth login
hf download Haoyuwu/MultiWorldData --repo-type dataset \
--local-dir ./data
bash preprocess/untar_chunks.shAfter running preprocess/untar_chunks.sh, the archives are extracted to:
data/ittakestwo_release/— It Takes Two datasetdata/robots_release/— Robotics dataset
modelscope login <YOUR_API_KEY>
modelscope download --model HaoyuWuRUC/MultiWorldCheckpoint \
multiworld_480p_fulldata.safetensors --local_dir ./checkpoints
modelscope download --model HaoyuWuRUC/MultiWorldCheckpoint \
multiworld_480p_toydata.safetensors --local_dir ./checkpoints
modelscope download --model HaoyuWuRUC/MultiWorldCheckpoint \
multiworld_320p_robots.safetensors --local_dir ./checkpointshf auth login
hf download Haoyuwu/MultiWorldCheckpoint multiworld_480p_fulldata.safetensors --local-dir ./checkpoints --repo-type model
hf download Haoyuwu/MultiWorldCheckpoint multiworld_480p_toydata.safetensors --local-dir ./checkpoints --repo-type model
hf download Haoyuwu/MultiWorldCheckpoint multiworld_320p_robots.safetensors --local-dir ./checkpoints --repo-type modelInference checkpoint trained on full dataset.
python -m torch.distributed.run --nproc_per_node=8 \
ittakestwo/parallel_inference.py \
--inference-seed 0 \
--num-inference-steps 50 \
--config-path ittakestwo/configs/inference_480P_full.yaml \
--model-path checkpoints/multiworld_480p_fulldata.safetensors \
--output-dir outputs/eval_480P_full Inference checkpoint trained on toy dataset.
python -m torch.distributed.run --nproc_per_node=8 \
ittakestwo/parallel_inference.py \
--inference-seed 0 \
--num-inference-steps 35 \
--config-path ittakestwo/configs/inference_480P_toy.yaml \
--model-path checkpoints/multiworld_480p_toydata.safetensors \
--output-dir outputs/eval_480P_toyInference on robotics dataset.
python -m torch.distributed.run --nproc_per_node=8 \
robots/parallel_inference.py \
--config-path robots/configs/inference.yaml \
--model-path checkpoints/multiworld_320p_robots.safetensors \
--output-dir outputs/test_robotics_output This codebase is built on top of the open-source implementation of DiffSynth-Studio, VGGT and the Wan2.2 repo.
Welcome to have a discussion on the project and Video World Models. You can find me at through wuhaoyu556@connect.hku.hk.
If you find our work useful for your research, please consider citing our paper:
@article{wu2025multiworld,
title={MultiWorld: Scalable Multi-Agent Multi-View Video World Models},
author={Wu, Haoyu and Yu, Jiwen and Zou, Yingtian and Liu, Xihui},
journal={arXiv preprint arXiv:2604.18564},
year={2026}
}
