Lambda-Build-One

This project uses a dataset of China's 新闻联播 articles to train a model to predict whether or not an article will be about Xi Jinping.

Version 3 uses a Bag of Words approach and then trains various classifier models on it (with a 75% shuffled training set, tested on 25% of data). Their accuracy scores are:

Multinomial Naive Bayes: 92.646%

Bernoulli Naive Bayes: 85.953%

Logistic Regression: 95.904%

Support Vector Clustering (SVC): 93.883%

Version 2 has various bits of code with LDA classification, and is uploaded more as a playground for various experiments in NLP at a time during which I am very unfamiliar with the NLP process, rather than as an example of good code. It has not been uploaded due to filesize constraints.

A blogpost of this can be viewed here: https://towardsdatascience.com/analysis-of-chinese-media-393c5c60c644

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Chinese_News.ipynb		Chinese_News.ipynb
Chinese_Newsv3.ipynb		Chinese_Newsv3.ipynb
README.md		README.md
chinese_news.csv		chinese_news.csv
中文停用词表.txt		中文停用词表.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lambda-Build-One

About

Releases

Packages

Languages

razzlestorm/CCTV-News-Analysis

Folders and files

Latest commit

History

Repository files navigation

Lambda-Build-One

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages