This repository provides computational resources for data-driven persona research and practice. It contains links to existing datasets, a collection of prompts when using Generative AI in persona development, and example notebooks illustrating how conventional ML algorithms can be used in data-driven persona development.
The purpose of the repository is to advance the scientific study of data-driven personas, particularly by advocating resource sharing and joint benchmarking of results.
NOTE: The code has been designed to run on Google Colab.
This is work by the Persona Team (https://personateam.xyz). If you are interested in doing a PhD on data-driven personas or other form of research collaboration, reach out to us!
Taxonomy:
- PD = Persona Development Task Resource (under 'Notebooks')
- PE = Persona Evaluation Task Resource (under 'Notebooks')
- DS = Persona Dataset Resource
- PP = Persona Prompt Resource
- PS = Persona System Resource
- PR = Persona Repository Resource
And:
- a = simulated data
- b = real data (anonym)
So, "PD01a" indicates a persona development task resource with ID of 01 that uses simulated data.
A major portion of the code has been generated using AI (Claude 4 Sonnet). The code has been verified and tested by humans.
- Joni Salminen (jonisalm@uwasa.fi), University of Vaasa, Finland
- Danial Amin, University of Vaasa, Finland
- Ilkka Kaate, University of Turku, Finland
- Bernard J. Jansen, Qatar Computing Research Institute, Hamad Bin Khalifa University, Qatar
If you found PersonaBase useful, please use the following citation:
@misc{salminen2025PersonaBase,
title={PersonaBase: Developing Computational Resources for the Scientific Benchmarking of Data-Driven Personas},
author={Joni Salminen and Danial Amin and Ilkka Kaate and Bernard J. Jansen},
year={2025}
}