Data: An Analysis of the Impact of Ridesharing on DUIs in San Francisco 2005-2015

This is a homework assignment I did as part of an application to be a public policy research specialist for Lyft. The prompt was simply “utilize publicly available data to make a compelling point about Lyft. Present your point in 1 slide.” I provided both this codebook and the slide at the end as my application materials. I unfortunately got into the final round and was not selected, but it was a great chance to exercise some of my interdisciplinary chops.

The goal of this analysis is to examine the effect of ridesharing’s entrance into the market (2009 for Uber, 2012 for Lyft) on the number of DUI arrests per 100,000 residents between the hours of 10 pm and 3 am. The analysis will measure a period from 2005-2014.

Datasets:

DUI:

City of San Francisco Open Data SFPD Incidents from 1 January 2003 https://data.sfgov.org/Public-Safety/SFPD-Incidents-from-1-January-2003/tmnf-yvry

Population Data:

2000-2010: City and Town Intercensal Estimates: https://www.census.gov/popest/data/intercensal/cities/cities2010.html

2010-2013: U.S. Census American Community Survey: https://www.census.gov/programs-surveys/acs/

2014-2016: Open Data Network Forecast: http://www.opendatanetwork.com/region/1600000US0667000/San_Francisco_CA/population/population/2013?

Active Liquor Licenses by Year: https://www.abc.ca.gov/Annualreports/Annual_Reports.html

City of San Francisco Police Expenditures By Year: http://openbook.sfgov.org/

Packages required:

dplyr reshape2 ggplot2

Analysis will utilize an ANOVA of a Linear Regression model showing the effect of rideshare’s existance on weekend DUI rates per 100k when controlled for number of liquor licenses and police enforcement expenditures

## DUI Data Loading and Cleaning
totalDUI <- read.csv("./Drunk_Driving_2003-2016.csv")
## trim out all drug related DUIs
unique(totalDUI$Descript)
## [1] DRIVING WHILE UNDER THE INFLUENCE OF DRUGS            
## [2] DRIVING WHILE UNDER THE INFLUENCE OF ALCOHOL          
## [3] DRIVING WHILE UNDER THE INFLUENCE OF ALCOHOL, W/INJURY
## [4] DRIVING WHILE UNDER THE INFLUENCE OF DRUGS, W/INJURY  
## 4 Levels: DRIVING WHILE UNDER THE INFLUENCE OF ALCOHOL ...
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.3
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
booze <- unique(totalDUI$Descript)[2:3]
boozeDUI <- filter(totalDUI, Descript %in% booze)
## trim dates into years
boozeDUI$Date <- format(as.Date(boozeDUI$Date, "%m/%d/%Y"), format="%Y")
keep <- c(2005:2014)
boozeDUI <- filter(boozeDUI, Date %in% keep)
## check if there are any innocents in here by resolution
count(boozeDUI, Resolution)
## Source: local data frame [11 x 2]
## 
##                       Resolution     n
##                           (fctr) (int)
## 1                 ARREST, BOOKED  2960
## 2                  ARREST, CITED   319
## 3          EXCEPTIONAL CLEARANCE     1
## 4                JUVENILE BOOKED    11
## 5                 JUVENILE CITED     2
## 6                        LOCATED     2
## 7                           NONE   221
## 8   PROSECUTED BY OUTSIDE AGENCY     1
## 9  PROSECUTED FOR LESSER OFFENSE     1
## 10             PSYCHOPATHIC CASE     1
## 11                     UNFOUNDED     4
## Looks like we've got a few unfounded, an exceptional clearance and 
## a psychopathic case. Let's get rid of those.
wrongGuy <- c("UNFOUNDED", "PSYCHOPATHIC CASE", "EXCEPTIONAL CLEARANCE")
boozeDUI <- boozeDUI[!(boozeDUI$Resolution %in% wrongGuy),]
## find total DUI per year
count(boozeDUI, Date)
## Source: local data frame [10 x 2]
## 
##     Date     n
##    (chr) (int)
## 1   2005   178
## 2   2006   231
## 3   2007   272
## 4   2008   388
## 5   2009   529
## 6   2010   429
## 7   2011   436
## 8   2012   373
## 9   2013   362
## 10  2014   319
## make time easier to handle. There are other ways of doing this but I
## don't do them often and I didn't want to waste time so I did what works
boozeDUI$Time <- gsub(":", ".", boozeDUI$Time)
boozeDUI$Time <- as.numeric(boozeDUI$Time)
## Filter to the hours of 10 pm to 3 am
nightDUI <- data.frame(rbind(filter(boozeDUI, Time < 2.59), filter(boozeDUI, Time > 22)))

## Now lets just find nightlife DUIs (Fridays, Saturdays, Sundays)
weekendDUI <- filter(nightDUI, DayOfWeek %in% c("Friday", "Saturday", "Sunday"))
## Get rid of Friday morning
weekendDUI <- weekendDUI[-which(weekendDUI$DayOfWeek=="Friday" & weekendDUI$Time < 2.59),]

write.csv(weekendDUI, file = "SF_weekend_data1.csv")

Here we move to excel. I manually imputed the population of SF each year and number of active retail liquor licenses in the City of San Francisco using the State of California’s records (unfortunately all data is locked in PDF). Grabbed police expenditures from SF Open Book, manually imputed as well since the sheet was open and there aren’t many observations. Establish dummy variable for the presence of rideshare (1 in years with, 0 in years without). Creat column for Weekend DUI per 100,000 using the formula (Bar Hour DUI/Population) x 100,000. Calculate Police expenditures per 100k using formula (Police/Population) x 100,000.

weekendDUI <- read.csv("SFDataNew.csv")
weekendDUI$Year <- as.factor(weekendDUI$Year)
lyftModel <- lm(DUI.100K~Rideshare+Police.100k+Liquor.Licenses.100k, weekendDUI)
summary(lyftModel)
## 
## Call:
## lm(formula = DUI.100K ~ Rideshare + Police.100k + Liquor.Licenses.100k, 
##     data = weekendDUI)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.418 -3.704 -1.534  2.351 11.557 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           3.053e+02  9.476e+01   3.221  0.01462 * 
## Rideshare             5.125e+01  1.357e+01   3.777  0.00692 **
## Police.100k           1.263e-07  4.151e-07   0.304  0.76983   
## Liquor.Licenses.100k -6.157e-01  1.993e-01  -3.089  0.01759 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.135 on 7 degrees of freedom
## Multiple R-squared:  0.7533, Adjusted R-squared:  0.6476 
## F-statistic: 7.126 on 3 and 7 DF,  p-value: 0.0156

It would appear that ridesharing has a significant explanatory effect on the number of DUIs around bar hours. This is not a conclusive result, but it certainly suggests a connection between the two. The correlation between active liquor licenses and DUI arrests is also not entirely unexpected. Our residuals are substantial, meaning there’s a lot we still don’t know about what causes DUIs. That’s totally fine in this case, and generally to be expected of complex systems. Our low number of degrees of freedom should also advise us to proceed with caution when making this point authoritatively. However, this is a good sign for future research. A larger study taking into account a larger sample of American cities using the same methodology would yield a more significant result. Now to make a graphic.

graphicData <- select(weekendDUI, Year, Bar.DUI.100k)
library(reshape2)
graphicDataLong <- melt(graphicData)
## Using Year as id variables
graphicDataLong$variable <- gsub("Bar.DUI.100k", "DUIs Per 100k", graphicDataLong$variable)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
plot <- ggplot(data=graphicDataLong, aes(Year, value)) +
        geom_point(aes(Year, value), color="lightpink") + 
        geom_line(color = "lightpink", group=1, lwd=3) + 
        theme(panel.background = element_rect(fill='white')) +
        geom_vline(xintercept=5) 
ggsave("lyftGraphicRough.png")
## Saving 7 x 5 in image
## Export to Photoshop for Cleanup!

Oliveira_Lyft_Slide.png

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s