Exploratory Data Analysis(EDA) using dplyr
and ggplot2
: Workshop Materials for Lumohacks 2018.
In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets often with visual methods to summarize their main characteristics, check data quality, and help form hypothesis.
In this workshop, we will work through a EDA problem using data wrangling and visualization:
- Data wrangling: we will use
dplyr
package which contains 5 main functions:
select()
: get a subset of columnsfilter()
: get a subset of rowsmutate()
: create a new columngroup_by()
: define groups according to the values in one or more columnssummarise()
: reduce many rows down to a single value of interest.
- Visualization: we will use
ggplot2
to do some visualizations:
- set up a plot with
ggplot()
- Choose which variables to plot using argument
mapping = aes(x, y)
inggplot()
- Choose which type of plot using
geom_
- Add title and subtitle using
labels
By the end of this workshop, you will have a better understanding of the basic techniques to do EDA on your own analysis.
- Run the Rmarkdown in Rstudio locally
- Run the Rmarkdown in Rstudio cloud
- R
- Rstudio
- dplyr
- ggplot2