Investigate missing values post join #67

kevinykuo · 2018-12-17T19:47:09Z

TylerGrantSmith · 2018-12-19T05:44:21Z

I will try and commit some analysis of the missing values tomorrow. It is primarily caused by two values in the region_code column (00 and ##) as well as a slew of vehicle_code values which are not in auto2_vei.

ryanbthomas · 2019-02-03T18:43:36Z

I'm inferring from the state of the repo when I downloaded it that the 00 and ## region code values were codes as NA.

I was able to find all of the missing vehicle codes by looking at the auto2_vei.csv files in other datasets. I downloaded 2012A, 2013A, and 2013B for comparison and I very lazily compared them all at once. I will now go through and see if there is a single dataset I can compare to to make explaining what was done simpler. I will also see if the missing codes have the same description as codes that are already present in the 2013B dataset.

ryanbthomas · 2019-02-11T03:12:05Z

I was able to determine that every missing code can be found in the 2013A dataset. However, I was wondering if we should treat the dataset on it's own terms? As this would be what an analyst might encounter in practice.

I haven't been able to fully analyze the vehicle_code, but there is enough structure there to at least partially fill in the gaps. For example, the first three digits correspond to the make of the vehicle (e.g., '002' is always Toyota).

kevinykuo added the data label Dec 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate missing values post join #67

Investigate missing values post join #67

kevinykuo commented Dec 17, 2018

TylerGrantSmith commented Dec 19, 2018

ryanbthomas commented Feb 3, 2019

ryanbthomas commented Feb 11, 2019

Investigate missing values post join #67

Investigate missing values post join #67

Comments

kevinykuo commented Dec 17, 2018

TylerGrantSmith commented Dec 19, 2018

ryanbthomas commented Feb 3, 2019

ryanbthomas commented Feb 11, 2019