You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I will try and commit some analysis of the missing values tomorrow. It is primarily caused by two values in the region_code column (00 and ##) as well as a slew of vehicle_code values which are not in auto2_vei.
I'm inferring from the state of the repo when I downloaded it that the 00 and ## region code values were codes as NA.
I was able to find all of the missing vehicle codes by looking at the auto2_vei.csv files in other datasets. I downloaded 2012A, 2013A, and 2013B for comparison and I very lazily compared them all at once. I will now go through and see if there is a single dataset I can compare to to make explaining what was done simpler. I will also see if the missing codes have the same description as codes that are already present in the 2013B dataset.
I was able to determine that every missing code can be found in the 2013A dataset. However, I was wondering if we should treat the dataset on it's own terms? As this would be what an analyst might encounter in practice.
I haven't been able to fully analyze the vehicle_code, but there is enough structure there to at least partially fill in the gaps. For example, the first three digits correspond to the make of the vehicle (e.g., '002' is always Toyota).
The text was updated successfully, but these errors were encountered: