Data: Analysis of Wearable Tech Readouts

This is my take on Peer Assessment 1 for the sixth course in the Coursera Data Science Specialization, “Reproducible Research”. It involves a simple data analysis but is meant more to demonstrate a familiarity with reproducible research workflow using R markdown and the knitr package. The assignment specifies that the code must be shown for each step, so I’ll begin by setting the global option to echo code.

echo=TRUE

This analysis requires the following packages:

  • dplyr
  • ggplot2
  • reshape2

Next we’ll load the data, which is a dataset containing the readout from wearable tech monitoring the amount of steps taken in five minute intervals. It has three variables

  • steps: Number of steps taken in a 5-minute interval with missing values coded as NA
  • date: The date on which the measurement was taken in YYYY-MM-DD format
  • interval: Indentifier for the 5-minute interval in which the measurement was taken.

It’s stored in a CSV file with 17,568 total observations. Let’s load that now:

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip", destfile = "./repdata-data-activity.zip", method = "curl")

unzip("./repdata-data-activity.zip")

data <- read.csv("./activity.csv", colClasses=c("integer","Date","numeric"))

##Section 1:

The first question on the assignment asks us to calculate the total number of steps taken per day and then plot it into a histogram. I like using dplyr for this kind of stuff, so if you don’t have it installed go ahead and do that. Thank me later. I like ggplot2 for plotting but that comes down to preference

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
groupSteps <- group_by(data, date)
steps <- summarise(groupSteps,
                   total = sum(steps, na.rm = TRUE))

library(ggplot2)
ggplot(steps, aes(date, total)) + geom_bar(stat = "identity", colour = "black", fill = "black", width = 0.7)  + labs(title = " Total Number of Steps Taken Each Day", x = "Date", y = "Steps")

Screen Shot 2015-03-16 at 7.54.28 PM

The assignment then asks us to calculate and report the mean and median of the total number of steps taken per day:

summary(steps$total)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    6778   10400    9354   12810   21190

Swag.

##Section 2:

Section number two asks us to make a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)

data2 <- data[complete.cases(data),]
groupSteps2 <- group_by(data2, interval)
steps2 <- summarise(groupSteps2,
                    avg = mean(steps))


ggplot(steps2, aes(interval, avg)) + geom_line(colour = "black", fill = "black", width = 0.7)  + labs(title = "Average Steps By Interval", x = "Interval", y = "Steps")

Screen Shot 2015-03-16 at 7.54.21 PM

Then it asks which interval has the highest average value

steps2[steps2$avg == max(steps2$avg),]
## Source: local data frame [1 x 2]
## 
##   interval      avg
## 1      835 206.1698

##Section 3:

Section 3 first asks what the total number of missing values is in the data set

sum(is.na(data))
## [1] 2304

Then it asks us to fill these observations with some kind of data, mean or median will work.

data3 <- data
mean <- mean(!is.na(data$steps))
data3[is.na(data3)] <- mean 

Then it asks us create a histogram of the total number of steps per day then calculate the mean and median total steps per day. We’ll just crib the function from the first section.

groupSteps3 <- group_by(data3, date)
steps <- summarise(groupSteps,
                   total = sum(steps))

ggplot(steps, aes(date, total)) + geom_bar(stat = "identity", colour = "black", fill = "black", width = 0.7)  + labs(title = " Total Number of Steps Taken Each Day", x = "Date", y = "Steps")

Screen Shot 2015-03-16 at 7.54.15 PM Then we have to calculate and report the mean and median. Easy enough.

mean(data3$steps)
## [1] 32.59391
median(data3$steps)
## [1] 0

##Section 4:

Finally, the assignment asks if there are different activity levels on weekdays vs. weekends. First we have to make a new factor variable denoting whether each day is a weekend or weekday. I used gsub for each day, and it’s a little tedious so if you have a more elegant solution I’m open to suggestion!

data4 <- data
data4 <- mutate(data4, weekdays = weekdays(date))
data4[,4] <- gsub("Monday", "Weekday", data4[,4])
data4[,4] <- gsub("Tuesday", "Weekday", data4[,4])
data4[,4] <- gsub("Wednesday", "Weekday", data4[,4])
data4[,4] <- gsub("Thursday", "Weekday", data4[,4])
data4[,4] <- gsub("Friday", "Weekday", data4[,4])
data4[,4] <- gsub("Saturday", "Weekend", data4[,4])
data4[,4] <- gsub("Sunday", "Weekend", data4[,4])

Then we make the graph in much the same way we made the one in section 2. I put both lines on the same graph because I felt it’s easier to compare that way than doing it in panels like Dr. Peng did.

data4 <- data4[complete.cases(data4),]
weekday <- filter(data4, data4[,4] == "Weekday")
groupWeekday <- group_by(weekday, interval)
newWeekday <- summarise(groupWeekday,
                    avg = mean(steps))

weekend <- filter(data4, data4[,4] == "Weekend")
groupWeekend <- group_by(weekend, interval)
newWeekend <- summarise(groupWeekend,
                    avg = mean(steps))

library(reshape2)
total <- cbind(newWeekday, newWeekend[,2])
colnames(total) <- c("Interval", "Weekday Average", "Weekend Average")
total <- melt(total, id.vars = "Interval")

ggplot(total, aes(Interval, value), group = variable) + geom_line(aes(color=variable, width = 0.7)) + labs(title = "Average Steps By Interval", x = "Interval", y = "Steps")

Screen Shot 2015-03-16 at 7.58.32 PM

That should be everything! Thanks for reading and good luck in the rest of the class!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s