Skip to content

RPowell07/soccer_subreddits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project 3: Comparison of r/soccer and r/FIFA

Background:

Association football, aka soccer, is the world's most popular sport. In 1993, EA Sports developed Fifa, a yearly Association Football game which allows players to play as their favorite players on their favorite teams. Since then, Fifa has grown and it is now listed in Guinness World Records as the best-selling sports video game franchise in the world, having sold over 325 million copies as of 2021 source. Every year, EA Sports releases a new version of Fifa, with Fifa 22 coming out last year. This Fall, Fifa 23 will be coming out, and will be the last Fifa, as EA Sports' partnership with Fifa will end after 30 years. Future games in the franchise will be named EA Sports FC.

For this project, we have pulled data from the website reddit, looking specifically at two subreddits: r/soccer, which is the main subreddit for association football, and r/FIFA, the main subreddit for the EA Sports FIFA video game franchise. In this notebook, we have put together a model for determining whether a post comes from the r/soccer or the r/FIFA subreddit.

Problem Statement:

As a data scientist for EA Sports, we want to examine what people are posting about in the r/soccer subreddit and compare that to what people are posting about in the r/FIFA subreddit in order to make recommendations on how to improve the game.

Datasets

Data Dictionary

Feature Type Description
subreddit category which subreddit the post belonged to (1: r/soccer, 0: r/FIFA)
title object title of the post
author object author of the post
selftext object description of the post (if applicable)
cleantitle object title of the post without html characters and all lowercase
cleantext object description of the post (if applicable) without html characters and all lowercase

Conclusions

Our best model was the Multinomial Naive Bayes using the CountVectorizer. We got an accuracy score of 93.5%. I chose this as our production model because the accuracy score was high, but also because our training score was .962 and our testing score was .935, so our model was good at predicting new data and did not appear to be overfit. Our recall score was 91.8% and our specificity was 95.3%. Because there is an equal negative between false positives and false negatives, I decided to go with the model with the best accuracy score.

The posts from r/soccer focus more on stats, with most common words including shots, passes, goals and time. Whereas the r/FIFA community is focused more on the players making up the team (most common words include players, player, play, team, and tots (team of the season)). This makes a lot of sense because the r/soccer community are all focused around the same leagues, where every individual FIFA player is focused on their own personal team and league rather than a league shared across all users.

My recommendation to EA Sports is to push their Competitive FIFA leagues (ePremier League, eMLS). A lot of the stats and keywords that are commonly found in the r/soccer community would be more prevalent in r/FIFA if there was a much smaller subset of leagues that was followed by the entire community.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published