Hallucinations Leaderboard

Hallucinations Leaderboard

free

Explore, filter, and compare large language models ranked by hallucination rates. Search models by name, precision, and type, or submit new models to the leaderboard.

About

The Hallucinations Leaderboard is a community-driven benchmarking platform hosted on Hugging Face Spaces that measures and ranks large language models (LLMs) by their hallucination rates across a variety of evaluation datasets and tasks. Hallucination — when AI models produce plausible-sounding but factually incorrect output — is one of the most critical obstacles to reliably deploying LLMs in real-world applications. This leaderboard gives AI researchers, developers, and practitioners a transparent, continuously updated view of how well leading models resist hallucination. Users can search for specific models, filter results by precision, model type, and other parameters, and directly compare performance metrics across multiple benchmarks. The platform also supports community submissions, enabling anyone to contribute by adding new models to the evaluation queue with minimal inputs such as model name, precision type, and model category. The leaderboard is especially valuable for teams selecting LLMs for production use cases where factual accuracy is paramount — including medical, legal, financial, and enterprise knowledge management contexts. By centralizing hallucination evaluation data in one place, it enables more informed model selection decisions, encourages model developers to compete on factual reliability, and accelerates broader research into reducing AI hallucinations. The project is openly accessible and free to use, making it a key resource for the global AI research community.

Key Features

  • Hallucination Benchmarking: Evaluates and ranks LLMs across multiple hallucination-specific datasets, providing empirical scores on factual reliability.
  • Model Search & Filtering: Allows users to search models by name and filter results by precision, model type, and other parameters for targeted comparisons.
  • Community Model Submission: Anyone can submit new models to the evaluation queue using simple inputs like model name, precision, and type, making the leaderboard community-driven.
  • Multi-Model Comparison: Displays side-by-side performance metrics for a wide range of open and proprietary LLMs, enabling direct comparison at a glance.
  • Open & Transparent Evaluation: Hosted publicly on Hugging Face Spaces with open evaluation queues, promoting reproducibility and trust in benchmark results.

Use Cases

  • AI researchers comparing factual accuracy across multiple LLMs to identify the most reliable models for knowledge-intensive tasks.
  • Engineering teams selecting a base LLM for production deployment in high-stakes domains like healthcare, legal, or finance.
  • Model developers benchmarking their newly released LLMs against established models to quantify hallucination improvements.
  • Students and academics studying hallucination as a research topic and needing empirical data across many models.
  • Enterprise AI teams assessing vendor models for trustworthiness before integrating them into customer-facing applications.

Pros

  • Completely Free: Accessible to anyone at no cost, making it a valuable resource for independent researchers, students, and startups alike.
  • Community-Driven: Open model submission process ensures the leaderboard stays current with newly released models contributed by the broader AI community.
  • Focused Signal: Specializes specifically in hallucination evaluation, providing a clear and actionable metric for teams prioritizing factual accuracy.

Cons

  • Occasional Availability Issues: As a Hugging Face Space, it can experience runtime errors or downtime, which may interrupt access to leaderboard data.
  • Narrow Scope: Focuses exclusively on hallucination benchmarks and does not cover other important LLM performance dimensions like speed, cost, or reasoning.
  • Evaluation Lag: Community-submitted models may take time to be processed and appear on the leaderboard, limiting real-time tracking of the latest releases.

Frequently Asked Questions

What is the Hallucinations Leaderboard?

It is an open benchmarking platform hosted on Hugging Face Spaces that evaluates large language models on their tendency to hallucinate — producing factually incorrect or fabricated outputs — and ranks them accordingly.

How do I submit a model for evaluation?

You can submit a model through the leaderboard's community submission interface by providing basic details such as the model name, precision type, and model category. The model will then enter the evaluation queue.

Is the leaderboard free to use?

Yes, the Hallucinations Leaderboard is completely free to access and use. It is publicly hosted on Hugging Face Spaces.

Which models are included on the leaderboard?

The leaderboard includes a wide range of open-source and publicly available large language models. New models are continuously added via community submissions or by the maintainers.

Who maintains the Hallucinations Leaderboard?

It is maintained by the hallucinations-leaderboard organization on Hugging Face, with contributions from the broader AI research community.

Reviews

No reviews yet. Be the first to review this tool.

Alternatives

See all