Paper Link: arxiv
RFOP revisits the fusion and orthogonal projection for face-voice association by effectively focusing on the relevant semantic information within the two modalities.
Please follow the instructions here to make the environment and install the libraries.
Use following command to train the model
python main.py --batch_size 64 --epochs 50 --dim_embed 256
Use following command to compute score for the trained model
python computeScore.py --ckpt <path to checkpoint.pth.tar> --dim_embed 256
The codebase is inspired from the FOP repository. We thank them for releasing their valuable codebase.
- FAME Face-voice Association in Multilingual Environments (FAME Challenge)
- PAEFF Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association (InterSpeech 2025)
- SBNet Single-branch Network for Multimodal Training (ICASSP 2023)
- FOP Fusion and Orthogonal Projection for Improved Face-Voice Association (ICASSP 2022)
@misc{rfop2025,
title={RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association},
author={Abdul Hannan and Furqan Malik and Hina Jabbar and Syed Suleman Sadiq and Mubashir Noman},
year={2025},
eprint={2512.02860},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.02860},
}
