Thanks to Patrick Lavallee Delgado, MSCAPP Class of ‘20, for writing this post!

I work with the Harris School of Public Policy Admissions as a student ambassador, speaking with prospective students about the MS in Computational Analysis and Public Policy (MSCAPP) program. I enjoy having those conversations because I get to share my experience of becoming a better policy practitioner through the use of data and technology.

This fall, the substance of my reflections matured from aspirational to actual: where I used to talk about “what I will be able to do,” I now talk about “what I can do.” Honestly, I was surprised by how quickly I became sure of my abilities vis-à-vis the learning outcomes of the program. Because this is a professional school, I can start getting good at my next job before even starting it.

For me, this change came in large part from my final projects for Regional Innovation Strategies (Harris) and Big Data Application Architecture (Computer Science). For the Harris project, I needed to understand the availability of venture capital in southern Maine (full disclosure, I’m from southern Maine) to draw some conclusion about innovation activity in that startup community. The trouble was, I couldn’t find reliable and free data that could describe trends in venture capital by industry over time, let alone for southern Maine. So, the next best thing was to go to straight to the US Securities and Exchange Commission…and sift through upwards of 21 million public disclosures that had been filed since the year 2000 to collect the data I needed.

This offered a perfect use case for big data: it had volume, velocity, and variety. So, for the Computer Science project, I built a web app over a distributed data warehouse to show the distribution of venture capital by metropolitan statistical area. I wrote a few Bash scripts to download the data, a program in Java to efficiently store the data in the Hadoop filesystem, a few Hive and Scala scripts to rearrange that data so that it could answer my questions about venture capital, and another Java program to keep the data updated with daily releases from the SEC. This was wicked hard and took a ton of time, but in the end, I had evidence in hand to describe venture capital in the context of regional innovation.

I think this is what Harris and the Department of Computer Science had in mind when designing MSCAPP. It is exactly what I signed up for, but I don’t think I had fully grasped the awesomeness of synthesizing all the world’s knowledge into a public policy recommendation. “The data demonstrate we do not have enough data.”

A lot of results are like this: until we run the scripts, we may not even know what we’re trying to address, or how to address it. But we now have data sets at our fingertips to take the next steps.