UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
-
Updated
Dec 15, 2025 - Python
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
LettuceDetect is a hallucination detection framework for RAG applications.
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
HaluMem is the first operation level hallucination evaluation benchmark tailored to agent memory systems.
🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
A benchmark for evaluating hallucinations in large visual language models
Unofficial implementation of Microsoft’s Claimify Paper: extracts specific, verifiable, decontextualized claims from LLM Q&A to be used for Hallucination, Groundedness, Relevancy and Truthfulness detection
TrustScoreEval: Trust Scores for AI/LLM Responses — Detect hallucinations, flags misinformation & Validate outputs. Build trustworthy AI.
Code release for THRONE, a CVPR 2024 paper on measuring object hallucinations in LVLM generated text.
When AI makes $10M decisions, hallucinations aren't bugs—they're business risks. We built the verification infrastructure that makes AI agents accountable without slowing them down.
A comprehensive study on reducing hallucinations in Large Language Models through strategic prompt engineering techniques. (COV + COT + Hybrid)
HALLUCINATED BY CURSOR WITh CODEX PLUGIN:::BEWARE:::::BaseX Coding Language - Revolutionary Base 5.10 Quantum Teleportation & Infinite Storage System by Joshua Hendricks Cole
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
A robust hybrid pipeline for detecting hallucinated citations in academic papers and research documents. The system combines exact bibliographic lookup, fuzzy matching, and optional LLM verification to classify citations as valid, partially valid, or hallucinated.
Driving away from the binary "hallucinations" evals to a more nuanced and context-dependent eval technique.
Dataset Generation and Pre-processing Scripts for the Research titled: Leveraging the Domain Adaptation of Retrieval Augmented Generation (RAG) Models in Conversational AI for Enhanced Customer Service
An interactive Python chatbot demonstrating real-time contextual hallucination detection in Large Language Models using the "Lookback Lens" method. This project implements the attention-based ratio feature extraction and a trained classifier to identify when an LLM deviates from the provided context during generation.
This project integrates business rules management systems (BRMS) and a RAG, to offer an automated text generation solution, applicable in different contexts and significantly reducing LLM hallucinations. It's a complete architecture available in a chatBot and fully scalable according to needs
Legality-gated evaluation for LLMs, a structural fix for hallucinations that penalizes confident errors more than abstentions.
Add a description, image, and links to the hallucination-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the hallucination-evaluation topic, visit your repo's landing page and select "manage topics."