Automatic detection of fake content in social media, especially Twitter(now X) is a persistent problem. In theory, identifying false information on social networking sites is a binary classification issue. But the sheer amount of daily tweets would make it impossible to manually fact-check even a tiny portion of them. To address this challenge, the team behind the Truthseeker dataset crawled and crowd-sourced one of the most extensive ground-truth datasets containing more than 180,000 labels from 2009 to 2022 for tweets with a 5-label and 3-label classification using Amazon Mechanical Turk. However, it is impossible to perform this activity in near real-time. We propose a Large Language Model based approach which fact-checks the truthfulness of the tweet by comparing it with legitimate news sources of corresponding topics in real time via Retrieval Augmented Generation. We aim to build a system that is faithful to the legitimate news-source to generate a truthfulness value for every tweet.
Index Terms - Fake News Detection, Automatic Detection, Retrieval Augmented Generation, Large Language Model, Faithful AI
Video - https://youtu.be/RJIpInBJEjI