Can the gender of poets be classified based on their poem content/styles?

Project for our Data Science Study #Team 통소여

Data

Poetry data downloaded from Kaggle
We manually updated the poetry gender with data crawling

Methods

We implemented two versions, one including stopwords and one with stopwords excluded. Normally in textmining, stopwords are excluded. However, because poems tend to showcase emotions through various words, we decided to look into both versions.

The detailed manners of each method are described carefully in our files: /classification/ (View .ipynb or download my html file)

Random Forest

AdaBoostRegressor

KNN

Naive Bayes

SVC

Result Interpretation

We have constructed quite a few models and tried updating our parameter values but the accuracy was only around 0.5~0.6
Considering that we only have two groups, it is a very low value and we concluded that our analysis was not significant.
Also, there was not much of a difference whether or not we included stopwords.
(The accuracy is saved in  /data/accuracy.csv)

Scores with and without stopwords

Why?

There could be a couple reasons for this problem

First, we used 3000 poets for 15000 data(poems). We could have tried to obtain more data.
Also, poems are very short in length, unlike novels or scripts that are normally used in textmining.

Therefore, these conditions might have caused overfitting.

Lessons and to-think-abouts

Preparing for this hackathon, we have learned that preprocessing datas are one of the most important levels in datamining.
As it could distort our results, it should be approached with caution when it comes to removing values or filling NA blanks.
Also we concluded that classifying natural language into two groups is difficult than it seems, becuase language has many unpredictable & untangible factors.

Future think-abouts: Could it be improved by supervised & unsupervised learning?

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
classification		classification
data		data
pictures		pictures
presentation		presentation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can the gender of poets be classified based on their poem content/styles?

Data

Methods

The detailed manners of each method are described carefully in our files: /classification/ (View .ipynb or download my html file)

Result Interpretation

Scores with and without stopwords

Why?

Lessons and to-think-abouts

Wordcloud of our most frequently used words

About

Releases

Packages

Languages

yuridekim/Poets

Folders and files

Latest commit

History

Repository files navigation

Can the gender of poets be classified based on their poem content/styles?

Data

Methods

The detailed manners of each method are described carefully in our files: /classification/ (View .ipynb or download my html file)

Result Interpretation

Scores with and without stopwords

Why?

Lessons and to-think-abouts

Wordcloud of our most frequently used words

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages