Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement functionality for easily scrubbing poorly geocoded values from storage tables #3

Open
MattTriano opened this issue May 3, 2022 · 0 comments

Comments

@MattTriano
Copy link
Owner

At present, addresses are ingested into a storage table (currently user_data.address_table for normalized addresses and user_data.std_address_table for standardized addresses) and then geocoded in place. During geocoding, if the rating of the best candidate result is above some threshold value (currently 22), no geocoded candidate result is accepted and the value '-1' is inserted into the rating column. Subsequent runs of the geocoder (using the included geocoding functions) will only geocode rows where the rating is NULL, which essentially means only newly added distinct rows will be geocoded.

Even with the rating thresholding, the normalization or standardization rules can still misinterpret the address and locate it far from the correct location. If users don't have an easy method for flagging these results or just purging their results and imputing a rating of '-1', the address table will become polluted and it will become a chore to work with.

Possible Solutions:

  • (greater prevention) facilitate use the restrict_region parameter in geocoding
  • (post-hoc cleanup) develop functionality to allow us to select the records geocoded in a specific geocoding run (perhaps the filename and a timestamp of the start-time) and provide a boundary polygon where only records geocoded to locations within the polygon (from a geocoding run) are retained.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant