This repository is not for reproducibility. It is to highlight a small portion of the work I have done via some code snippets. The datasets are under strict confidentiality agreement.
The Washington Metro loses substantial amount of revenue each year to fare evaders (people who hop on bus without paying anything).
Our goal is to use existing administrative data along with American Community Survey data at the census block level to find a narrative on where the most fare evasion is happening and why it is so.
The lead Principal Investigator of the team was Vicki Lancaster. Lata Kodali, a Ph.D. student in Statistics at Virginia Tech and I were the two Data Science for Public Good Fellows on the team. We co-led a team of 4 undergraduate interns from Virginia Tech for this project.
We presented our findings to the executives at the Washington Metro Transport Authority. We also did a symposium poster on some of our work which is in this repo under WMATAFindings.pdf
Some of the datasets used in the project are as below:
- Bus Stops (10,988 observations): bus stopID, latitude and longitude.
- Approximate Person Counter (3,793,655 observations): front & back door entries and exits for a bus, route, trip and bus stop.
- Farebox (2,729,688 observations): fare evasion key presses, cash and smart transaction by bus, trip and bus stop.
- Mentoring undergraduate students in R programming
- Creating data dictionaries
- Data processing: standardizing time, pulling in GTFS data for joining tables, writing scaffold code for undergradutes to do data exploration on GIS map etc.
- Data generation: creating synthetic tripID (code snippet in this repo), calculating backdoor exits by census block groups etc.
- Data exploration: Looking at overcrowding (code snippet in this repo) and fare evasion correlation etc.
- SyntheticTripID.R: This code file is to demonstrate my familiarity with dplyr in using it to process over millions of rows of data.
- Overcrowding.R: This code file is to demonstrate a use of a chain of ifelse statements via mutate in exploring overcrowding on buses.