-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
54 lines (41 loc) · 2.29 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# GettingAndCleaningData
Course project for the Coursera.org course Getting and Cleaning Data
## Prerequisites
* The HAR data (from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip ) has to be present in the working directory in the folder UCI HAR Dataset.
* The dplyr library has to be installed
## run_analysis.R
The script reads the data from the folder UCI HAR Dataset, merges the training and test data, extracts only the measurements on the mean and standard deviation for each measurement, and returns a tidy data set with the average of each variable for each activity and each subject.
```{r, eval=FALSE}
library(dplyr)
runAnalysis <- function() {
#read column names
featureNames <- read.table("UCI HAR Dataset//features.txt")$V2
#read test data
testSubject <- read.table("UCI HAR Dataset/test/subject_test.txt", col.names="subject")
testY <- read.table("UCI HAR Dataset/test/y_test.txt", col.names="activity")
testX <- read.table("UCI HAR Dataset/test/X_test.txt", col.names=featureNames)
test <- cbind(testSubject, testY, testX)
#read training data
trainSubject <- read.table("UCI HAR Dataset/train/subject_train.txt", col.names="subject")
trainY <- read.table("UCI HAR Dataset/train/y_train.txt", col.names="activity")
trainX <- read.table("UCI HAR Dataset/train/X_train.txt", col.names=featureNames)
train <- cbind(trainSubject, trainY, trainX)
#combine test and training
data <- rbind(test, train)
#replace dots in colum names
colnames(data) <- gsub(colnames(data), pattern="\\.+", replacement="_")
colnames(data) <- gsub(colnames(data), pattern="_+$", replacement="")
#set subject as factor
data$subject <- as.factor(data$subject)
#set activity as factor and use labels
data$activity <- as.factor(data$activity)
activityLabels <- read.table("UCI HAR Dataset//activity_labels.txt")$V2
levels(data$activity) <- activityLabels
#extract mean and std
data <- select(data, subject, activity, contains("_mean"), contains("_std"), -contains("_meanFreq"))
#build second data set with mean values grouped by subject/activity
data2 <- group_by(data, subject, activity)
data2 <- summarise_each(data2, funs(mean))
return(data2)
}
```