PsyScam

A Benchmark for Psychological Techniques in Real-World Scams

This repository contains the code and partial dataset accompanying our paper submitted to EMNLP 2025:

PsyScam: A Benchmark for Psychological Techniques in Real-World Scams

🎯 Overview

Online scams exploit various psychological techniques (PTs) to manipulate victims. PsyScam provides a comprehensive benchmark to support the analysis and modeling of these techniques across three key NLP tasks:

🏷️ PT Classification: Multi-label classification of psychological techniques in scam content
✍️ Scam Completion: Generating realistic scam continuations given partial content
🔄 Scam Augmentation: Creating variations of existing scam content while preserving psychological techniques

📁 Repository Structure

PsyScam/
├── crawlers/                # Web scrapers for collecting scam reports from public sources
├── data/
│   ├── D2.csv              # Evaluation subset used in our experiments (sample dataset)
│   └── PTs.csv             # Comprehensive list of psychological technique labels
├── LLMExtractor.py         # Human-LLM collaborative annotation using GPT-4
├── PTClassification.py     # Multi-label psychological technique classification
├── ScamCompletion.py       # Scam completion generation task implementation
├── ScamAugmentation.py     # Scam augmentation generation task implementation
└── README.md               # Project documentation

🚀 Getting Started

Prerequisites

pip install -r requirements.txt

API Configuration

Create an api.key file in the root directory with your OpenAI API key:

Quick Start

PT Classification:

python PTClassification.py --csv data/D2.csv

use the trained model for inference

python inferencePT.py --model_path "./bert/results_multilabel/checkpoint-XXX" --text "Dear valued customer, you have been specially selected for this exclusive investment opportunity. Our expert team guarantees 500% returns within 30 days. This offer expires in 24 hours!"

Scam Completion:

python ScamCompletion.py --llm_model gpt41

Scam Augmentation:

python ScamAugmentation.py --llm_model gpt41

📊 Dataset

Our benchmark includes carefully curated scam reports annotated with psychological techniques. Due to safety and ethical considerations, the complete dataset is available upon request for research purposes only.

We only include dataset (D2.csv) in this repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PsyScam

🎯 Overview

📁 Repository Structure

🚀 Getting Started

Prerequisites

API Configuration

Quick Start

📊 Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
LLMExtractor.py		LLMExtractor.py
PTClassification.py		PTClassification.py
README.md		README.md
ScamAugmentation.py		ScamAugmentation.py
ScamCompletion.py		ScamCompletion.py
inferencePT.py		inferencePT.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PsyScam

🎯 Overview

📁 Repository Structure

🚀 Getting Started

Prerequisites

API Configuration

Quick Start

📊 Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages