MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Haoyu Wu$^{1*}$, Jiwen Yu $^{1}$, Yingtian Zou$^{2}$, Xihui Liu $^{1}†$

$^1$ The University of Hong Kong $^2$ SReal AI

(† Corresponding Author)

🎯 Overview

We present MultiWorld, a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency.

In the Multi-Agent Condition Module (Sec. 3.2), Agent Identity Embedding and Adaptive Action Weighting are employed to achieve multi-agent controllability. In the Global State Encoder (Sec. 3.3), we use a frozen VGGT backbone to extract implicit 3D global environmental information from partial observations, thereby improving multi-view consistency. MultiWorld scales effectively across varying agent counts and camera views, supporting autoregressive inference to generate beyond the training context length (Sec. 3.4).

🚀 News

[2026/4/21] Paper,code,data and project page are available. Welcome to have a try.

Setup Environments

conda create -n multiworld python=3.13 
conda activate multiworld
# install torch 
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
    --index-url https://download.pytorch.org/whl/cu128

pip install -r requirements.txt

Dataset Download

MultiWorld release contains two parts: It Takes Two game videos and Robotics videos.
All .tar archives are stored flat in the same dataset repository.

ModelScope Download

modelscope login <YOUR_API_KEY>
modelscope download --dataset HaoyuWuRUC/MultiWorldData \
    --local_dir ./data
bash preprocess/untar_chunks.sh

HuggingFace Download

hf auth login
hf download Haoyuwu/MultiWorldData --repo-type dataset \
    --local-dir ./data
bash preprocess/untar_chunks.sh

After running preprocess/untar_chunks.sh, the archives are extracted to:

data/ittakestwo_release/ — It Takes Two dataset
data/robots_release/ — Robotics dataset

Checkpoint Download

modelscope login <YOUR_API_KEY>
modelscope download --model HaoyuWuRUC/MultiWorldCheckpoint \
    multiworld_480p_fulldata.safetensors --local_dir ./checkpoints
modelscope download --model HaoyuWuRUC/MultiWorldCheckpoint \
    multiworld_480p_toydata.safetensors --local_dir ./checkpoints
modelscope download --model HaoyuWuRUC/MultiWorldCheckpoint \
    multiworld_320p_robots.safetensors --local_dir ./checkpoints

hf auth login
hf download Haoyuwu/MultiWorldCheckpoint multiworld_480p_fulldata.safetensors --local-dir ./checkpoints --repo-type model
hf download Haoyuwu/MultiWorldCheckpoint multiworld_480p_toydata.safetensors --local-dir ./checkpoints --repo-type model
hf download Haoyuwu/MultiWorldCheckpoint multiworld_320p_robots.safetensors --local-dir ./checkpoints --repo-type model

Inference

Inference checkpoint trained on full dataset.

python -m torch.distributed.run --nproc_per_node=8 \
    ittakestwo/parallel_inference.py \
    --inference-seed 0 \
    --num-inference-steps 50 \
    --config-path ittakestwo/configs/inference_480P_full.yaml \
    --model-path checkpoints/multiworld_480p_fulldata.safetensors \
    --output-dir outputs/eval_480P_full

Inference checkpoint trained on toy dataset.

python -m torch.distributed.run --nproc_per_node=8 \
    ittakestwo/parallel_inference.py \
    --inference-seed 0 \
    --num-inference-steps 35 \
    --config-path ittakestwo/configs/inference_480P_toy.yaml \
    --model-path checkpoints/multiworld_480p_toydata.safetensors \
    --output-dir outputs/eval_480P_toy

Robotics

Inference on robotics dataset.

python -m torch.distributed.run --nproc_per_node=8 \
    robots/parallel_inference.py \
    --config-path robots/configs/inference.yaml \
    --model-path checkpoints/multiworld_320p_robots.safetensors \
    --output-dir outputs/test_robotics_output

Acknowledgements

This codebase is built on top of the open-source implementation of DiffSynth-Studio, VGGT and the Wan2.2 repo.

Contact

Welcome to have a discussion on the project and Video World Models. You can find me at through wuhaoyu556@connect.hku.hk.

📜 Citation

If you find our work useful for your research, please consider citing our paper:

@article{wu2025multiworld,
  title={MultiWorld: Scalable Multi-Agent Multi-View Video World Models},
  author={Wu, Haoyu and Yu, Jiwen and Zou, Yingtian and Liu, Xihui},
  journal={arXiv preprint arXiv:2604.18564},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
diffsynth		diffsynth
ittakestwo		ittakestwo
preprocess		preprocess
robots		robots
utils		utils
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

🎯 Overview

🚀 News

Setup Environments

Dataset Download

ModelScope Download

HuggingFace Download

Checkpoint Download

Inference

Robotics

Acknowledgements

Contact

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

🎯 Overview

🚀 News

Setup Environments

Dataset Download

ModelScope Download

HuggingFace Download

Checkpoint Download

Inference

Robotics

Acknowledgements

Contact

📜 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages