BBC_NEWS_CLASSIFICATION

Dataset Description

Consists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005. Natural Classes: 5 (business, entertainment, politics, sport, tech)

If you make use of the dataset, please consider citing the publication:

D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006.

All rights, including copyright, in the content of the original articles are owned by the BBC.

Contact Derek Greene [email protected] for further information. http://mlg.ucd.ie/datasets/bbc.html

Functionalities of the code

Extracts the raw text from bbc_news_dataset.zip file and converts it into bbc.csv file
Pre-processes the data by lowercase conversion, removal of stopwords, removal of punctuations, and lemmatization
Split the data into train/development/test - 80%/10%/10%
Three feature entities are used - count, one-hot and tf-idf
For every feature entity number of features and feature selection methods are tuned using development set
For each feature entity a model is trained
Each model make predictions on test dataset
All 3 predictions are combined using majority voting
Accuracy, macro-averaged precision, macro-averaged recall, macro-averaged f1-score are calculated

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Instructions to run the code		Instructions to run the code
README.md		README.md
bbc_news_classification.ipynb		bbc_news_classification.ipynb
bbc_news_dataset.zip		bbc_news_dataset.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BBC_NEWS_CLASSIFICATION

Dataset Description

Functionalities of the code

About

Releases

Packages

Languages

faya98/BBC_NEWS_CLASSIFICATION

Folders and files

Latest commit

History

Repository files navigation

BBC_NEWS_CLASSIFICATION

Dataset Description

Functionalities of the code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages