Skip to content

KiteFlyKid/PsyScam

Repository files navigation

PsyScam

A Benchmark for Psychological Techniques in Real-World Scams

Paper Dataset

This repository contains the code and partial dataset accompanying our paper submitted to EMNLP 2025:

PsyScam: A Benchmark for Psychological Techniques in Real-World Scams

🎯 Overview

Online scams exploit various psychological techniques (PTs) to manipulate victims. PsyScam provides a comprehensive benchmark to support the analysis and modeling of these techniques across three key NLP tasks:

  • 🏷️ PT Classification: Multi-label classification of psychological techniques in scam content
  • ✍️ Scam Completion: Generating realistic scam continuations given partial content
  • 🔄 Scam Augmentation: Creating variations of existing scam content while preserving psychological techniques

📁 Repository Structure

PsyScam/
├── crawlers/                # Web scrapers for collecting scam reports from public sources
├── data/
│   ├── D2.csv              # Evaluation subset used in our experiments (sample dataset)
│   └── PTs.csv             # Comprehensive list of psychological technique labels
├── LLMExtractor.py         # Human-LLM collaborative annotation using GPT-4
├── PTClassification.py     # Multi-label psychological technique classification
├── ScamCompletion.py       # Scam completion generation task implementation
├── ScamAugmentation.py     # Scam augmentation generation task implementation
└── README.md               # Project documentation

🚀 Getting Started

Prerequisites

pip install -r requirements.txt

API Configuration

Create an api.key file in the root directory with your OpenAI API key:

Quick Start

  1. PT Classification:
    python PTClassification.py --csv data/D2.csv

use the trained model for inference

python inferencePT.py --model_path "./bert/results_multilabel/checkpoint-XXX" --text "Dear valued customer, you have been specially selected for this exclusive investment opportunity. Our expert team guarantees 500% returns within 30 days. This offer expires in 24 hours!"
  1. Scam Completion:

    python ScamCompletion.py --llm_model gpt41
  2. Scam Augmentation:

    python ScamAugmentation.py --llm_model gpt41

📊 Dataset

Our benchmark includes carefully curated scam reports annotated with psychological techniques. Due to safety and ethical considerations, the complete dataset is available upon request for research purposes only.

We only include dataset (D2.csv) in this repo.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages