Work-in-Progress: Where the Heck Should I Park in San Diego???

Anyone who has lived and worked in a major metropolitan area knows the struggle of efficiently finding parking. To be sure, with some experience one can make educated guesses, but spending the morning circling the block for a spot a reasonable distance from your final destination is hassle to say the least.

I was recently doing one of my periodic trolls through the City of San Diego Open Data portal, and I noticed two datasets for transactional data for every parking meter in the City. This got the gears turning in my head for a project that could combine a lot of my favorite data science applications. What if using machine learning, we could create a set of features that would predict with reasonable accuracy whether or not a given parking meter would be in an “open” or “closed” state at a given hour, on a given day? What if we could implement the model we create into an interactive web based tool that automatically takes the inputs we want to use by tapping appropriate APIs, and tells you where you need to go right at that moment. Sounds interesting no? So I took a dive.

The way I’m approaching this is like so: I’m going to break each day of the year into ten minute intervals. For each of those ten minute intervals, we compile a list of characteristics (date, day of the week, and weather being the first three that pop into my head that could significantly affect the availability of parking). We then use this data to train a model based on a random test dataset noting whether a meter was open or closed that is generated based on our transaction data. This is going to be an ongoing project since it’s pretty meaty, so I think it’s best to post progress in an installment fashion.

The first thing I did was find a quality API for weather data. After some googling around, I found Dark Sky API (https://darksky.net/) Dark Sky maintains minute by minute weather data and provides API access with generous call limits and cheap per request fees after, so it seemed like the right way to go. Using the following code, I was able to extract information on whether it was clear, cloudy, or rainy for my ten minute intervals:

setwd("~/Desktop/Data_Science/Parking")


darksky_api_key <- function(force = FALSE) {

  env <- Sys.getenv('DARKSKY_API_KEY')
  if (!identical(env, "") && !force) return(env)

  env <- Sys.getenv("FORECASTIO_API_KEY")
  if (!identical(env, "") && !force) {
    message("FORECASTIO_API_KEY is deprecated, please update environment variable to DARKSKY_API_KEY")
    return(env)
  }

  if (!interactive()) {
    stop("Please set env var DARKSKY_API_KEY to your Dark Sky API key",
      call. = FALSE)
  }

  message("Couldn't find env var DARKSKY_API_KEY See ?darksky_api_key for more details.")
  message("Please enter your API key and press enter:")
  pat <- readline(": ")

  if (identical(pat, "")) {
    stop("Dark Sky API key entry failed", call. = FALSE)
  }

  message("Updating DARKSKY_API_KEY env var to PAT")
  Sys.setenv(DARKSKY_API_KEY = pat)

  pat

}

Times2016 <- seq(ISOdate(2016,1,1), ISOdate(2017,1,1), by="10 mins")
Times2016 <- as.data.frame(Times2016)
Times2016[,1] <- as.character(Times2016[,1])
Times2016[,1] <- gsub(" ", "T", Times2016[,1])
colnames(Times2016) <- c("times")


latitude <- 32.7157
longitude <- 117.1611

convert_time <- function(x) as.POSIXct(x, origin="1970-01-01")

get_forecast_for <- function(latitude, longitude, timestamp,
                             units="us", language="en", exclude=NULL,
                             add_json=FALSE, add_headers=FALSE,
                             ...) {

  url <- sprintf("https://api.darksky.net/forecast/%s/%s,%s,%s",
                 darksky_api_key(), latitude, longitude, timestamp)

  params <- list(units=units, lang=language)

  if (!is.null(exclude)) params$exclude <- exclude

  resp <- httr::GET(url=url, query=params, ...)
  httr::stop_for_status(resp)

  tmp <- httr::content(resp, as="parsed")

  lys <- c("hourly", "minutely", "daily")

  # hourly, minutely and daily blocks might not be in the response
  # so only process the ones that are actually in the response

  lapply(lys[which(lys %in% names(tmp))], function(x) {

    dat <- dplyr::bind_rows(lapply(tmp[[x]]$data, dplyr::as_data_frame))

    # various time fields might not be in the block data, so only
    # process which ones are in the block data

    ftimes <- c("time", "sunriseTime", "sunsetTime", "temperatureMinTime",
                "temperatureMaxTime", "apparentTemperatureMinTime",
                "apparentTemperatureMaxTime", "precipIntensityMaxTime")

    # convert times to POSIXct since they make sense in tbl_dfs/data.frames

    ly <- dplyr::mutate_each_(dat, dplyr::funs(convert_time),
                              vars=ftimes[which(ftimes %in% colnames(dat))])

  }) -> fio_data

  fio_data <- setNames(fio_data, lys[which(lys %in% names(tmp))])

  # add currently as a data frame to the return list since that's helpful for
  # rbinding later for folks

  if ("currently" %in% names(tmp)) {
    currently <- dplyr::as_data_frame(tmp$currently)
    if ("time" %in% colnames(currently)) {
      currently <- dplyr::mutate(currently, time=convert_time(time))
    }
    fio_data$currently <- currently
  }

  if (add_json) fio_data$json <- tmp

  ret_val <- fio_data

  if (add_headers) {
    dev_heads <- c("cache-control", "expires", "x-forecast-api-calls", "x-response-time")
    ret_heads <- httr::headers(resp)

    ret_val <- c(fio_data, ret_heads[dev_heads[which(dev_heads %in% names(ret_heads))]])
  }

  class(ret_val) <- c("darksky", class(ret_val))
  return(ret_val$currently$summary)

}


## Only calling head to demonstrate concept for the Knit, don't want to burn API calls
weather <- apply(head(Times2016), 1, get_forecast_for, latitude=latitude, longitude=longitude)

weather <- t(weather)

minuteWeather <- data.frame(Times=Times2016[c(1:6),1], Weather=weather[,1])
## Warning in data.frame(Times = Times2016[c(1:6), 1], Weather = weather[, :
## row names were found from a short variable and have been discarded
knitr::kable(minuteWeather, caption="Weather History in San Diego, Ten Minute Intervals")
Weather History in San Diego, Ten Minute Intervals
Times Weather
2016-01-01T12:00:00 Clear
2016-01-01T12:10:00 Clear
2016-01-01T12:20:00 Clear
2016-01-01T12:30:00 Clear
2016-01-01T12:40:00 Clear
2016-01-01T12:50:00 Clear

 

Step one of an iterative process achieved. I’m really excited about this project! Updates forthcoming.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s