Real-world data rarely come clean. Using Python and its libraries, I have gathered data from a variety of sources and in a variety of formats, assessed its quality and tidiness, then cleaned it.
The dataset on which I have done wrangling (and analyzing and visualizing) is the tweet archive of Twitter users @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, however? Almost always higher than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs, Brent." WeRateDogs has over 4 million followers and has received international media coverage.
-
Data wrangling, which consists of:
- Gathering data
- Assessing data
- Cleaning data
-
Storing, analyzing, and visualizing your wrangled data
-
Reporting on 1) Data wrangling efforts (wrangle_report.pdf) and 2) Data analyses and visualizations (act_report.pdf)