About
Pheme (PHEME: Computing Veracity – the Fourth Challenge of Big Data) is a multi-national research initiative addressing what its team calls the fourth 'V' of big data: veracity. While volume, velocity, and variety have long dominated the big data conversation, Pheme focuses on whether information circulating online is actually true. The project introduces the concept of 'phemes' — memes enhanced with truthfulness metadata — and classifies online content into four categories: speculation, controversy, misinformation, and disinformation. Drawing on partners from seven countries across natural language processing, text mining, web science, social network analysis, and information visualization, Pheme uses a three-pronged analytical approach: intrinsic document analysis (lexical, semantic, syntactic), cross-referencing with trusted data sources such as PubMed and Linked Open Data via Ontotext's GraphDB, and diffusion analysis of how information spreads through social networks. Key deliverables include open-source veracity intelligence algorithms built on Sheffield's GATE text mining platform, a human-annotated rumour dataset from the University of Warwick, and an interactive visual analytics dashboard (based on Ushahidi's SwiftRiver) for digital journalists. The dashboard visualises rumour diffusion patterns, geospatial author distributions, and message stance (confirming, denying, or questioning). Pheme is particularly suited for researchers, journalists, and developers working on misinformation detection, fact-checking pipelines, and computational social science.
Key Features
- Veracity Classification: Automatically classifies online content into four pheme types: speculation, controversy, misinformation, and disinformation using linguistic and semantic analysis.
- Real-Time Rumour Diffusion Analysis: Tracks how rumours and claims spread across social networks, mapping who shares what, when, and to whom, with geospatial visualisations.
- Trusted Source Cross-Referencing: Cross-references claims against authoritative data sources such as PubMed and Linked Open Data (via Ontotext GraphDB) to ground veracity assessments.
- Open-Source Visual Analytics Dashboard: Provides journalists and analysts with an interactive dashboard (based on Ushahidi's SwiftRiver) showing rumour diffusion patterns, stance labels, and author influence maps.
- GATE-Based NLP Algorithms: Releases veracity intelligence algorithms as open source, built on Sheffield's GATE text mining platform for extensibility and community adoption.
Use Cases
- Digital journalists using the visual analytics dashboard to verify breaking-news rumours before publication during live events such as protests or disasters.
- Medical information platforms filtering out health misinformation by cross-referencing claims against PubMed and other authoritative biomedical databases.
- Academic researchers studying how speculation, controversy, and disinformation spread across Twitter and other social networks over time.
- Fact-checking organisations integrating Pheme's open-source NLP algorithms into automated pipelines for large-scale claim verification.
- Government and public health agencies monitoring social media for emerging misinformation during crises to enable timely corrective communication.
Pros
- Open-Source and Extensible: Core algorithms and datasets are released publicly, enabling researchers and developers to build on and extend the platform freely.
- Interdisciplinary Rigour: Combines NLP, social network analysis, web science, and information visualisation from seven-country partnerships for a comprehensive veracity framework.
- Practical Application Targets: Designed for real-world use in digital journalism and medical information verification, making it directly applicable to high-stakes domains.
Cons
- Research-Stage Maturity: As an academic research project, it may lack the polish, documentation, and commercial support expected of a production-ready tool.
- Limited Real-Time Scalability Documentation: Processing large volumes of social media content in real time remains a stated challenge; production deployment details are sparse.
Frequently Asked Questions
Pheme analyses online content — especially social media posts — to assess how truthful claims are. It detects and classifies rumours into four types: speculation, controversy, misinformation, and disinformation, and tracks how these spread across networks.
A pheme is the project's term for a meme (a viral piece of online content) that has been enhanced with veracity information — i.e., metadata indicating how truthful or rumorous it is. The name references Pheme, the Greek goddess of fame and rumours.
Yes. The project aims to release its veracity intelligence algorithms and datasets as open source, built on the GATE text mining platform and accompanied by a human-annotated rumour dataset.
Pheme targets digital journalists needing a rumour verification dashboard, medical information system operators wanting to filter unreliable health claims, and researchers studying misinformation and computational social science.
Pheme cross-references content against trusted external sources including PubMed (for medical information) and Linked Open Data via Ontotext's GraphDB platform, in addition to analysing the document's own linguistic features and its diffusion patterns on social networks.