This script obtains a wearable data from an Internet source and compute the average of the mean and standard deviation of the accelerometer and gyroscope data of each subject participated in the experiment and the activity they did.
This script makes use of dplyr
Run run_analysis.R in R or RStudio
source("run_analysis.R")
tidy_dataset <- run_analysis()
The script will automatically download the wearable data from the link and perform the following processes:
- Download the data and save it in
data
folder of the working directory. (The script will automatically detects a Windows OS or OS X or Linux and use the appropriate method to download the data) - Extract the data to the current working directory. If the data exist, it will use the existing data instead of downloading a new one.
- Merge the training and test features data into one and extract only mean and standard deviation of each measurement.
- Merge the training and test label data and convert the numeric labels to friendly string label based on the label mapping as provided.
- Merge the training and test subject label data
- Combine the features, label and subject data set into one combined data set
- Calculate the mean of each mean and standard deviation of each measurement of each subject and activity
The original data set is download and extracted in the working directory and the following files are used:
- ./UCI HAR Dataset/features.txt
- ./UCI HAR Dataset/activity_labels.txt
- ./UCI HAR Dataset/test/X_test.txt
- ./UCI HAR Dataset/train/X_train.txt
- ./UCI HAR Dataset/test/y_test.txt
- ./UCI HAR Dataset/train/y_train.txt
- ./UCI HAR Dataset/test/subject_test.txt
- ./UCI HAR Dataset/train/subject_train.txt
./UCI HAR Dataset/features.txt is used for the column indexing and naming
./UCI HAR Dataset/activity_labels.txt is used for activity label mapping from a numerical value to a string description of that activity. Activity labels are mapped as follows:
- Walking
- Walking Upstairs
- Walking Downstairs
- Sitting
- Standing
- Laying
./UCI HAR Dataset/test/X_test.txt and ./UCI HAR Dataset/train/X_train.txt are combined to form a consolidated features data set and then only the mean() and std() columns are selected. The "()" have been stripped off from the column naming for ease of use.
./UCI HAR Dataset/test/y_test.txt and ./UCI HAR Dataset/train/y_train.txt are combined to form a conslidated label data set that has its values mapped with the activity labels and combined with the features data set
./UCI HAR Dataset/test/subject_test.txt and ./UCI HAR Dataset/train/subject_train.txt are also combined and combined with the features data set