10:30 AM - 11:50 AM
Course ID
Spring 2018

The past decade has witnessed an explosion in the collection of ‘big data,’ and the sophistication and accessibility of the tools required to analyze those data. This has spurred government agencies and policy analysts to embrace novel, data-driven approaches to policy creation and evaluation.

This is an introductory course in programming and data analysis for public policy students with no prior coding experience; it is the first in Harris’s new data science sequence. It is for anyone who wants to gather, explore, and share raw quantitative data – or work with others who do. The course has three goals:

(1) We will first introduce students to the tools required to write and share code: text editors, the command line, the python shell, and version control (git).

(2) Students will be asked to "think algorithmically," translating self-contained questions into python programs.  We will cover the fundamentals of the language including types, control, functions, input/output, and scripts. We will touch on debugging and (time-permitting) computability.

(3) We will then cover tools and recipes for retrieving, cleaning, visualizing, and analyzing data.

  • Data science libraries: manipulating data with pandas, plotting with matplotlib and plotly, and running basic statistical and geographical analysis (GIS). The pandas structures resemble R, and are a useful groundwork for the second course in this series.
  • Relational databases (SQL): selecting and aggregating data from databases.
  • Web scraping and APIs: how to retrieve and use public data from the web.

Ultimately, students should be comfortable using what they’ve learned in further Harris/Chicago courses in programming and statistics (incl. Policy Lab) – and in research after leaving Harris. They should be confident independently finding and exploring new packages for those projects. They should know enough to productively collaborate on projects with engineers, and understand the potential of such work.