title | author | date | output |
---|---|---|---|
README |
Chris Kelly |
Thursday, October 20, 2015 |
html_document |
Course project
The script run_analysis.R creates a tidy dataset from the dataset described at
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones,
which dataset is available at:
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
The run_analysis.R script is commented, so a detailed description of all the steps isn't needed here. A summary is warranted, however.
Note: the dataset must be unzipped in the "UCI HAR Dataset/" subdirectory of the current working directory in R or RStudio.
The script reads the dataset and combines the test and training data files (6 in all) into
one dataset, containing only the 66 mean measurements, discarding the meanFreq variables
(since they are derived from other data). The activity codes were replaced with the activity
names for clarity. The script writes this as the file: tidyData.txt using write.table()
.
Also a tidy summary of the dataset is created in tidySummary.txt, also written when write.table()
.
The summary is in the "wide" format, with one variable in each column. The first column is the
subject ID, followed by the activity, followed by 66 columns for the mean variables chosen for the
dataset, averaged across all observations of each activity for each subject.
This summary can be read into R or RStudio with:
MyURL <- "http://s3.amazonaws.com/coursera-uploads/user-7f4773206024329704d09eda/975117/asst-3/b091792076da11e5a5e959f85260abc1.txt"
read.table(MyURL, header = TRUE)
The codebook for the dataset is Codebook.html or Codebook.md.