Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate missing values post join #67

Open
kevinykuo opened this issue Dec 17, 2018 · 3 comments
Open

Investigate missing values post join #67

kevinykuo opened this issue Dec 17, 2018 · 3 comments
Labels

Comments

@kevinykuo
Copy link
Contributor

screen shot 2018-12-17 at 11 46 44 am

@kevinykuo kevinykuo added the data label Dec 17, 2018
@TylerGrantSmith
Copy link
Contributor

I will try and commit some analysis of the missing values tomorrow. It is primarily caused by two values in the region_code column (00 and ##) as well as a slew of vehicle_code values which are not in auto2_vei.

@ryanbthomas
Copy link
Contributor

I'm inferring from the state of the repo when I downloaded it that the 00 and ## region code values were codes as NA.

I was able to find all of the missing vehicle codes by looking at the auto2_vei.csv files in other datasets. I downloaded 2012A, 2013A, and 2013B for comparison and I very lazily compared them all at once. I will now go through and see if there is a single dataset I can compare to to make explaining what was done simpler. I will also see if the missing codes have the same description as codes that are already present in the 2013B dataset.

@ryanbthomas
Copy link
Contributor

I was able to determine that every missing code can be found in the 2013A dataset. However, I was wondering if we should treat the dataset on it's own terms? As this would be what an analyst might encounter in practice.

I haven't been able to fully analyze the vehicle_code, but there is enough structure there to at least partially fill in the gaps. For example, the first three digits correspond to the make of the vehicle (e.g., '002' is always Toyota).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants