Simple implementation of an end-to-end system (tested on Google Colab) capable of fact checking and fake news detection.
The pipeline is composed of multiple tasks:
- Check Fact Checking Worthiness
- Collection of possible evidences (using Google)
- Evidences relevance ranking
- Evidences stance detection (Supporting/Confuting)
List of datasets used:
- Random sentences extracted from wikipedia as in: https://arxiv.org/abs/1808.09468
- A Million Article Headlines: https://www.kaggle.com/therohk/million-headlines
- Relevance detection using http://www.fakenewschallenge.org dataset (but http://www.msmarco.org suggested)
- Claim-Article pairs (Stance detection) extracted from https://www.politifact.com