In this one day course, we provide a comprehensive introduction to R and how it can be used for data science and statistics. We begin by providing a thorough introduction to RStudio, which is the most popular and powerful interfaces for using R. We then introduce all the fundamentals of the R language and R environment: variables and assignment, data structures, operators, functions, scripts, packages, projects, etc. We end with an introduction to data processing and formatting (aka, data wrangling), an introduction to data visualization, and introduce how to some of the most widely used statistical methods such as linear regression, etc. From this course, you will gain a comprehensive introduction to R, which will serve as foundation for progressing further with R to any kind of data analysis, data science, or statistics.
This course is aimed at anyone who is interested in using R for data science or statistics. R is widely used in all areas of academic scientific research, and also widely throughout the public, and private sector.
This course will be hands-on and workshop based. Throughout each day, there will be a minimal amount of lecture style presentation, i.e., using slides, introducing and explaining key concepts. However, even in these cases, the topics being covered will include practical worked examples that will work through together.
Teaching will be done online via video link using Zoom. Although not strictly required, using a large monitor or preferably even a second monitor will make the learning experience better, as you will be able to see my RStudio and your own RStudio simultaneously. All the sessions will be recorded, and made available immediately on a private video hosting website. All materials will be shared via Git, which will allow for instantaneous sharing of code etc.
The course will take place online using Zoom. On each day, the live video broadcasts will occur between (UK local time; British Summer Time; UTC+1) at:
- 10am-12pm
- 1pm-3pm
- 4pm-6pm
We will assume only a minimal amount of familiarity with some general statistical and mathematical concepts. These concepts will arise when we discuss statistics and data analysis. Anyone who has taken any undergraduate (Bachelor's) level course on (applied) statistics can be assumed to have sufficient familiarity with these concepts.
No prior experience with R or any other programming language is required. Of course, any familiarity with any other programming will be helpful, but is not required.
Attendees of the course will need to use a computer on which RStudio can be installed. This includes Mac, Windows, and Linux, but not tablets or other mobile devices. Instructions on how to install and configure all the required software, which is all free and open source, is provided here.
An alternative to using a local installation of RStudio is to use RStudio cloud (https://rstudio.cloud/). This is a free to use and full featured web based RStudio.
- The What and Why of R. We'll start by briefly explaining what R is, what is used for, and why is has become so popular.
- Guided tour of RStudio. RStudio is the most widely used interface to R. We will provide a tour of all its parts and features and how to use it effectively.
- Step by step introduction to R. Having explain what R is, and introduced RStudio, we turn to our coverage of all the fundamentals of R and the R environment. These include
- variables and assignment
- vectors
- data frames
- functions
- scripts
- installing and loading packages
- using RStudio projects
- reading in data, etc. This topic will be detailed so that everyone obtains a solid grasp on these fundamentals, which makes all subsequent learning much easier.
- Introduction to wrangling, visualization, statistics. In the last section of the course, we will provide a very brief introduction to data wrangling, data visualization, and doing statistical analysis in R. These are huge topics, and so here, we just provide a brief introduction.
Slides for the workshop are available here. Some related notes, which elaborate on the content covered in the workshop are here. The data files that we will use are available here.
The three 2hr sessions will be video recorded. These videos will be hosted on Vimeo, but will be password protected. The password will be sent by email to add attendees. The page with the links to the videos is here. The videos will be hopefully available to view within 1 hour of the end of each session.