GitHub topics: hallucination-detection

Repositories

aimonlabs/aimon-python-sdk

This repo hosts the Python SDK and related examples for AIMon, which is a proprietary, state-of-the-art system for detecting LLM quality issues such as Hallucinations. It can be used during offline evals, continuous monitoring or inline detection. We offer various model quality metrics that are fast, reliable and cost-effective.

Language: Python - Size: 1.68 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 10 - Forks: 6

aimonlabs/hallucination-detection-model

HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification

Language: Python - Size: 281 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4 - Forks: 0

SigilNode/CHIMERA-Protocol

Framework for logic auditing, symbolic tension, and epistemic resilience in language models

Size: 20.5 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

open-compass/ANAH

[ACL 2024] ANAH & [NeurIPS 2024] ANAH-v2 & [ICLR 2025] Mask-DPO

Language: Python - Size: 1.32 MB - Last synced at: 1 day ago - Pushed at: 22 days ago - Stars: 43 - Forks: 4

fannie1208/FactTest

Code for "FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees"

Size: 0 Bytes - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

uptrain-ai/uptrain

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

Language: Python - Size: 36.9 MB - Last synced at: 16 days ago - Pushed at: 8 months ago - Stars: 2,258 - Forks: 199

KRLabsOrg/LettuceDetect

LettuceDetect is a hallucination detection framework for RAG applications.

Language: Python - Size: 2.22 MB - Last synced at: 16 days ago - Pushed at: 19 days ago - Stars: 207 - Forks: 16

Kernel-Dirichlet/AcceleRAG

RAG-accelerator

Language: Python - Size: 43 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

Ruiyang-061X/VL-Uncertainty

🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".

Language: Python - Size: 7.12 MB - Last synced at: 15 days ago - Pushed at: about 1 month ago - Stars: 31 - Forks: 2

AlexanderVNikitin/kernel-language-entropy

Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)

Language: Python - Size: 773 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 18 - Forks: 1

Kanisha-Shah/Hallucination-Mitigation-Using-RAG

A Columbia University capstone project focused on mitigating hallucinations in Medical Question Answering systems using Retrieval-Augmented Generation (RAG), ElasticSearch, and LLM-based validation.

Size: 882 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

oumi-ai/halloumi-demo

Try out HallOumi, a state-of-the-art claim verification model in a simple UI!

Language: TypeScript - Size: 174 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 0

patrick-tssn/VideoHallucer

VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)

Language: Python - Size: 21.1 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 27 - Forks: 0

nikolamilosevic86/verifAI

VerifAI initiative to build open-source easy-to-deploy generative question-answering engine that can reference and verify answers for correctness (using posteriori model)

Language: Jupyter Notebook - Size: 86.2 MB - Last synced at: 1 day ago - Pushed at: about 2 months ago - Stars: 60 - Forks: 4

IAAR-Shanghai/UHGEval

[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.

Language: Python - Size: 65.1 MB - Last synced at: 13 days ago - Pushed at: 5 months ago - Stars: 161 - Forks: 17

Ruiyang-061X/Uncertainty-o

✨ Official code for our paper: "Uncertainty-o: One Model-agnostic Framework for Unveiling Epistemic Uncertainty in Large Multimodal Models".

Language: Python - Size: 5.96 MB - Last synced at: 23 days ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 1

F4biian/HalluRAG

Source code of "The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM's Internal States" (arXiv: https://arxiv.org/abs/2412.17056)

Language: Python - Size: 32.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 0

AikyamLab/hallucinogen

A benchmark for evaluating hallucinations in large visual language models

Language: Python - Size: 1.32 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

AdityaMayukhSom/hallucination-detection-pipeline

Generate highlights from abstracts, check whether they contain hallucinations and if so, classify the hallucination as factual or not.

Language: Jupyter Notebook - Size: 16.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

serhanylmz/pas2

PAS2: A Python-based hallucination detection system that evaluates AI response consistency through paraphrasing and semantic similarity analysis. Features include response evaluation, similarity metrics, visualization tools, and a web interface for interactive testing.

Language: Python - Size: 13.7 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

deshwalmahesh/PHUDGE

Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute, relative and much more. It contains a list of all the available tool, methods, repo, code etc to detect hallucination, LLM evaluation, grading and much more.

Language: Jupyter Notebook - Size: 13.1 MB - Last synced at: 24 days ago - Pushed at: 10 months ago - Stars: 49 - Forks: 7

zjunlp/EasyDetect

[ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.

Language: Python - Size: 11.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 30 - Forks: 1

ComputerVisionFans/LLM-Hallucination-Traffic-Incidents-Dataset

Accepted by NAACL 2025, GenAI Project with targeting of exploring LLM Hallucination Problem

Language: Jupyter Notebook - Size: 23.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

maylad31/no_hallucination

Detecting Hallucinations in LLMs

Language: Python - Size: 9.77 KB - Last synced at: 24 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

nikilpatel94/CLUE-python

This is the implemetation of CLUE: Concept-Level Uncertainty Estimation for Large Language Models paper.

Language: Python - Size: 102 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

voidism/Lookback-Lens

Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"

Language: Python - Size: 22.4 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 118 - Forks: 6

liuzihe02/halu

Benchmark of various hallucination detection research and tools

Language: Python - Size: 35 MB - Last synced at: 16 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

amarquaye/atlas-api

API for the atlas project

Language: JavaScript - Size: 9.67 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Alsace08/Chain-of-Embedding

Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"

Language: Python - Size: 3.51 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 9 - Forks: 0

DataMas/ai-hallutinations-detection

Language: Jupyter Notebook - Size: 318 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

BaluHarshavardan99/Hallucination-in-Chat-bots

Hallucination in Chat-bots: Faithful Benchmark for Information-Seeking Dialogue

Language: Python - Size: 555 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

amarquaye/atlas-chrome

Chrome extension for the ATLAS project.

Language: JavaScript - Size: 346 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

amarquaye/atlas

🔢Hallucination detector for Large Language Models.

Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

jhaayush2004/RAG-Evaluation

Different approaches to evaluate RAG !!!

Language: Jupyter Notebook - Size: 224 KB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

OpenKG-ORG/EasyDetect

An Easy-to-use Hallucination Detection Framework for LLMs.

Language: Python - Size: 12 MB - Last synced at: 8 months ago - Pushed at: about 1 year ago - Stars: 45 - Forks: 3

ivan-kud/semeval-2024-shroom

Competition: SemEval-2024 Task-6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

Language: Jupyter Notebook - Size: 934 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 1

rafaelsandroni/antibodies

Antibodies for LLMs hallucinations (grouping LLM as a judge, NLI, reward models)

Language: Python - Size: 3.91 KB - Last synced at: 18 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Related Keywords

hallucination-detection 37 large-language-models 12 llm 11 hallucination 7 hallucinations 5 nlp 5 llms 4 natural-language-processing 4 hallucination-evaluation 4 rag 4 multimodal-large-language-models 3 evaluation 3 generative-ai 3 python 3 hallucination-mitigation 3 machine-learning 3 openai 3 nlp-machine-learning 3 ai 3 artificial-intelligence 2 llmops 2 knowledge-graph 2 gpt 2 uncertainty-quantification 2 generation 2 nli 2 easydetect 2 aigc 2 uncertainty 2 multimodal 2 pytorch 2 dataset 2 openai-api 2 semantic-similarity 1 continuous-monitoring 1 visualization 1 custom-dataset 1 feedback-collection 1 finetuning 1 judge 1 huggingface-transformers 1 qwen 1 chain-of-thought 1 large-multimodal-models 1 model-agnostic 1 o1 1 reasoning 1 closed-domain 1 retrieval-augmented-generation 1 aisafety 1 medical-safety 1 medical-visual-language-model 1 visual-language-models 1 abstractive-summarization 1 benchmarking 1 gradio 1 paraphrase-detection 1 llm-evaluation 1 embeddings 1 semeval-2024 1 t5-model 1 bert-models 1 chatbots 1 roberta-large 1 chrome 1 chrome-extension 1 bert-score 1 giskard 1 langchain 1 rag-evaluation 1 ragas 1 vectara 1 wandb 1 genrative-ai 1 llm-as-a-judge 1 llm-as-evaluator 1 ml 1 phi-3 1 sota 1 knowledge-editing 1 knowlm 1 model-editing 1 g-en-a-i 1 genai 1 llm-inference 1 naacl2025 1 factuality 1 text-generation 1 api 1 fastapi 1 interpretability 1 self-evaluation 1 trustworthy-ai 1 cosine-similarity 1 symbolic-analysis 1 acl 1 alignment 1 iclr 1 neurips 1 conformal-prediction 1