In real-world industrial settings, recommender systems rely heavily on online A/B Testing to evaluate performance. However, traditional A/B testing faces significant challenges: high resource consumption, potential degradation of user experience, and long evaluation latency. While offline user simulators attempt to solve this, existing methods often fail to reflect real human behaviors—they usually lack visual perception (ignoring movie posters) and simplify interactions (ignoring page-by-page exploration).
To bridge this gap, we introduce ABAgent, a Multimodal User Agent designed to simulate the entire human perception process and interactive behavior trajectories for A/B testing.
Key Features:
- Multimodal Perception: Unlike text-only agents, ABAgent processes both textual metadata and visual information (e.g., movie posters) to mimic how real users perceive items.
- Interactive Sandbox Environment: We developed a movie recommendation sandbox (similar to IMDb) where the agent explores content across varying levels of granularity (Overview Page vs. Detail Page).
- Human-like Behavior: The agent incorporates modules for Profile, Memory (Long-term & Short-term), Action, and a Fatigue System to prevent unrealistic excessive exploration.
To support the multimodal sandbox environment, we introduce the Multimodal-MovieLens-1M (MM-ML-1M) dataset. This is an extension of the original MovieLens-1M, enhanced with high-quality movie posters and rich metadata to support multi-interface exploration.
- Dataset Repo: https://github.com/wlzhang2020/MM-ML-1M
Follow these steps to set up the environment and run the simulation.
First, ensure you have Anaconda or Miniconda installed. Create a virtual environment and install the required dependencies.
Note: This project requires OpenAI CLIP, which must be installed directly from the GitHub repository to work correctly.
# Create a Conda environment (Python 3.9 is recommended)
conda create -n abagent python=3.9
# Activate the environment
conda activate abagent
# Install standard dependencies
pip install -r requirements.txt
# Install OpenAI CLIP (Required for multimodal perception)
pip install git+[https://github.com/openai/CLIP.git](https://github.com/openai/CLIP.git)
Download the CLIP model (ViT-L-14) and place the model file into the models folder.
Download the MM-ML-1M Dataset. Then, update the dataset path in app.py:
# In app.py
dataset = MovieLensDataset(data_dir="YOUR_LOCAL_DATASET_PATH")Create a .env file in the root directory (or set environment variables) with your OpenAI API key and endpoint. This is used to power the LLM-based agent.
OPENAI_KEY=your_openai_key
OPENAI_ENDPOINT=your_openai_endpoint
Before running the simulation, you need to generate the user preference profiles and activity traits. Run the data creation script:
python create_profile.py
This step generates user tastes, activity levels, and embeddings required for the agent.
Once everything is set up, execute app.py to start the ABAgent simulation in the interactive environment.
python app.py
app.py: The main entry point for the simulation.create_profile.py: Scripts for generating user profiles and multimodal embeddings.autosim/: Core logic for the ABAgent (Profile, Memory, Action modules).models/: Directory to store the pre-trained CLIP model.data/: Directory to store processed user history and traits.

