- This material will make sense if you have a good understanding of the basics of data analysis: sorting, filtering and grouping, whether in spreadsheets or in a database manager program.
- You need the latest version of R and RStudio installed to your computer (if you're at a conference, this has already been done for you!)
For this class, we’ll be learning the basics of the R programming language, and interacting with R through RStudio which is an efficient and user-friendly tool.
A bit of background: R is a language that was created for statistical analysis and graphics; it does these two things very well. Over time, additional functionality has been added to R (because it is an open standard language, so anyone can contribute packages or libraries) so that it now works very well for data analysis, web scraping, app building, and even natural language processing.
Learning a programming language requires some investment of time. We're going to cover the basics of what you need to know to get started, but I can't teach you everything in 3 hours (and frankly I don't know everything anyway). You will be able to do data analysis in R after this class, but you will also need to keep exploring, learning, and troubleshooting. I will provide resources to help with all that.
- Learn how to set things up to make working with R easy and efficient: we'll use R Project files (.RProj) and talk about a standard folder structure for every project.
- Learn the basics of R syntax and nomenclature.
- Import a CSV, an excel file, and some data from the web.
- Use R to sort, filter, group, summarize, and join your data tables.
- Get a glimpse of R's additional power when it comes to data cleaning and graphing.
- Leave with resources to learn more and troubleshoot problems.
All the data we'll use in this class in the data
folder. All notebook files (.Rmd) are in the scripts
folder. We'll go through them in this order: