This repository contains data and code supporting a BuzzFeed News article about city-level ZIP code demographics and COVID-19 cases, published May 7, 2020. See below for details.
The analysis uses ZIP–code level case counts (as of May 4, 2020 for each city except for Detroit, which is as of May 7, 2020) for the following five cities, stored in the data/raw
directory:
- New York City, sourced from the the city's Department of Health
- Chicago, sourced from Illinois Department of Health
- Detroit, sourced from the city's COVID-19 ZIP code dashboard
- Philadelphia, sourced from the Pennsylvania Department of Health
- Baltimore, sourced from the Maryland Department of Health
The data/raw
directory also includes ZIP-code level shapefiles for each of the five cities. Those geospatial files come from each city's open data portals and are used to filter for the appropriate ZIP codes and create the maps that are included in the article.
The demographic data used in the analysis comes from the 5-year ACS estimates for 2018 at the ZCTA level. The data file included in this repository has been pre-processed from seven different data files that are not included here, in order to reduce the size of the raw data in this repository.
The data/county-data
directory contains several datasets relevant to the metro-area calculations described below. The datasets are:
-
cbsa.csv
, which lists all Census-defined Core-based statistical areas (CBSAs): their titles, numeric codes, and populations. -
cbsa-counties.csv
, which lists all counties in each CBSA, via the Census' "delineation" files (Sept. 2018 vintage). -
nyt-county-counts.csv
, which contains the New York Times' tabulations of county-level COVID-19 case and death counts, through May 4, 2020.
The city-demographic-factors-analysis.ipynb
notebook loads the data for each city, calculates correlations between various demographic factors and per capita case counts (at a ZIP code level), and graphically explores some of those correlations. It also outputs the GeoJSON and CSV files used to create the maps and scatterplots in the story.
The calculate-metro-area-proportions.ipynb
notebook uses the county-level data described above to calculate the proportion of population, COVID-19 cases, and COVID-19 deaths in the United States' 15 largest metro areas, relative to US totals.
All code in this repository is available under the MIT License. Files in the output/ directory are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
If you have any questions about this repository you can reach out to John Templon at [email protected].
Looking for more from BuzzFeed News? Click here for a list of our open-sourced projects, data, and code.