About
The Hallucinations Leaderboard is a community-driven benchmarking platform hosted on Hugging Face Spaces that measures and ranks large language models (LLMs) by their hallucination rates across a variety of evaluation datasets and tasks. Hallucination — when AI models produce plausible-sounding but factually incorrect output — is one of the most critical obstacles to reliably deploying LLMs in real-world applications. This leaderboard gives AI researchers, developers, and practitioners a transparent, continuously updated view of how well leading models resist hallucination. Users can search for specific models, filter results by precision, model type, and other parameters, and directly compare performance metrics across multiple benchmarks. The platform also supports community submissions, enabling anyone to contribute by adding new models to the evaluation queue with minimal inputs such as model name, precision type, and model category. The leaderboard is especially valuable for teams selecting LLMs for production use cases where factual accuracy is paramount — including medical, legal, financial, and enterprise knowledge management contexts. By centralizing hallucination evaluation data in one place, it enables more informed model selection decisions, encourages model developers to compete on factual reliability, and accelerates broader research into reducing AI hallucinations. The project is openly accessible and free to use, making it a key resource for the global AI research community.
Key Features
- Hallucination Benchmarking: Evaluates and ranks LLMs across multiple hallucination-specific datasets, providing empirical scores on factual reliability.
- Model Search & Filtering: Allows users to search models by name and filter results by precision, model type, and other parameters for targeted comparisons.
- Community Model Submission: Anyone can submit new models to the evaluation queue using simple inputs like model name, precision, and type, making the leaderboard community-driven.
- Multi-Model Comparison: Displays side-by-side performance metrics for a wide range of open and proprietary LLMs, enabling direct comparison at a glance.
- Open & Transparent Evaluation: Hosted publicly on Hugging Face Spaces with open evaluation queues, promoting reproducibility and trust in benchmark results.
Use Cases
- AI researchers comparing factual accuracy across multiple LLMs to identify the most reliable models for knowledge-intensive tasks.
- Engineering teams selecting a base LLM for production deployment in high-stakes domains like healthcare, legal, or finance.
- Model developers benchmarking their newly released LLMs against established models to quantify hallucination improvements.
- Students and academics studying hallucination as a research topic and needing empirical data across many models.
- Enterprise AI teams assessing vendor models for trustworthiness before integrating them into customer-facing applications.
Pros
- Completely Free: Accessible to anyone at no cost, making it a valuable resource for independent researchers, students, and startups alike.
- Community-Driven: Open model submission process ensures the leaderboard stays current with newly released models contributed by the broader AI community.
- Focused Signal: Specializes specifically in hallucination evaluation, providing a clear and actionable metric for teams prioritizing factual accuracy.
Cons
- Occasional Availability Issues: As a Hugging Face Space, it can experience runtime errors or downtime, which may interrupt access to leaderboard data.
- Narrow Scope: Focuses exclusively on hallucination benchmarks and does not cover other important LLM performance dimensions like speed, cost, or reasoning.
- Evaluation Lag: Community-submitted models may take time to be processed and appear on the leaderboard, limiting real-time tracking of the latest releases.
Frequently Asked Questions
It is an open benchmarking platform hosted on Hugging Face Spaces that evaluates large language models on their tendency to hallucinate — producing factually incorrect or fabricated outputs — and ranks them accordingly.
You can submit a model through the leaderboard's community submission interface by providing basic details such as the model name, precision type, and model category. The model will then enter the evaluation queue.
Yes, the Hallucinations Leaderboard is completely free to access and use. It is publicly hosted on Hugging Face Spaces.
The leaderboard includes a wide range of open-source and publicly available large language models. New models are continuously added via community submissions or by the maintainers.
It is maintained by the hallucinations-leaderboard organization on Hugging Face, with contributions from the broader AI research community.
