BayesianBookworm is an innovative text analysis tool that harnesses the power of Bayes' Theorem to determine the probable authorship of literary texts. Initially focusing on the works of Jane Austen and Charles Dickens, this project introduces a novel approach to authorship attribution.
The program analyzes texts from the following novels, located in the Books/
directory:
- Jane Austen: Emma (em), Pride and Prejudice (pp), Persuasion (pe), Sense and Sensibility (ss)
- Charles Dickens: Great Expectations (ge), Hard Times (ht), A Tale of Two Cities (tc), Oliver Twist (ot)
A sophisticated dictionary maps word frequencies across these novels, forming the backbone for authorship prediction:
word_frequencies = {
"officer": [220, 322] # Austen: 220, Dickens: 322
}
The guess.py
script employs this frequency data within a Bayesian framework to estimate the author of a given text passage.
- Incorporating More Authors: Broadening the scope to include various authors for a more comprehensive literary analysis.
- Enhanced Algorithm Efficiency: Optimizing the processing capabilities for handling larger datasets.
- User Interface Development: Crafting an intuitive interface for effortless user interaction and result visualization.
BayesianBookworm represents a groundbreaking step in literary analytics, merging statistical methods with classical literature to unveil the hidden patterns in authorial styles.