Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable postalcode lookup #310

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

octmoraru
Copy link

👋 I did some work for the Pelias project and would love for everyone to have a look at it and provide feedback.


Here's the reason for this change 🚀

Currently wof-admin-lookup does not support lookups in the postalcode layer.


Here's what actually got changed 👏

This diff enables lookup in the postalcode layer, guarded by a config flag. Because the postal code data can amount to a significant size (~12.6 GB for the global postalcode dataset as of today), this setting is disabled by default.

The new lookupPostalCode setting affects only the "local resolver" mode. If it's set to true, it will start the postalcode layer worker. The layer is considered "untrusted", just like the neighbourhood layer.


Here's how others can test the changes 👀

I've tested this diff locally by using pip-service:

On my machine, the entire global postal code dataset was loaded in ~6 minutes:

2022-02-08T11:23:05.820Z - info: [wof-pip-service:master] postalcode worker loaded 421118 features in 370.922 seconds
2022-02-08T11:23:06.440Z - info: [wof-pip-service:master] PIP Service Loading Completed!!!

This diff enables lookup in the `postalcode` layer, guarded by a config flag.
Because the postal code data can amount to a significant size (12.6 GB as of
today), this setting is disabled by default.

The new `lookupPostalCode` setting affects only the "local resolver" mode. If
it's set to true, it will start the `postalcode` layer worker. The layer is
considered "untrusted", just like the `neighbourhood` layer.
Copy link
Member

@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, fairly simple change but effective 👍

As you mentioned it's important to default this behaviour to off but allow opt-in.

This will be beneficial for smaller geographies and for testing the quality of algorithmically assigned postcodes.

@missinglink
Copy link
Member

Worth mentioning that there are two 'postcode' properties in Pelias, the most common being an 'address property', the other is in the admin hierarchy which allows records to be parented by a postcode but has never really been used.

@missinglink
Copy link
Member

For that reason it might be required to 'special-case' postcode such that the hierarchy label is duplicated in the address properties. IIRC all the search queries target address_parts.zip.

@orangejulius
Copy link
Member

orangejulius commented Feb 8, 2022

Nice, this feature has been quite in-demand over the years. Postal codes can be very tricky for admin lookup, which is part of why we've held off, but I don't think it hurts anything to merge it with the default set to disabled.

Some ideas of future work that we might want (they should all come with further discussion):

  • Enabling postalcode lookup on a per-country basis (it makes more sense in some places than others)
  • Supporting some sort of field in our records to determine postalcode source. For example, we have considered a postalcode_accuracy field that can be set to exact if the address record comes with a trusted postalcode, or estimated if it came through admin lookup in a country where postalcode lookups by geometry are not trustworthy (like the United States).

I'll just link some relevant issues too, since we've discussed a lot of this before:

@orangejulius
Copy link
Member

@octmoraru one question, when you tested this, did the PIP service downloader give you any issues? As I recall it's set to default to admin only (it won't download the postalcode databases):

https://github.com/pelias/pip-service/blob/f89c0f1e0796a388f1000629ff8113ddbc2db295/bin/download#L3

@octmoraru
Copy link
Author

Thanks @missinglink and @orangejulius for your quick reactions!

For that reason it might be required to 'special-case' postcode such that the hierarchy label is duplicated in the address properties. IIRC all the search queries target address_parts.zip.

You mean doing that in the importers, right? Indeed, that would be quite neat as it has the potential to fill some gaps in source dataset. However, even with the current implementation, not all is lost. If I read this code correctly, at runtime, for most (all?) endpoints parent.postalcode will be renamed to postalcode, as will address_parts.zip, with the latter being preferred, if present. So, even without any changes to the importers, WOF postal code information will be visible in the results, for those entries that lacked this info in the source dataset. Please correct me if I'm wrong :)

Enabling postalcode lookup on a per-country basis (it makes more sense in some places than others)

Yeah, that sounds reasonable. I guess that would be quite easy to implement by adding a new setting and changing the logic in src/pip/readStream.js

I'll just link some relevant issues too, since we've discussed a lot of this before:

Thanks for sharing, it didn't cross my mind to search in pelias/pelias 🤦

@octmoraru one question, when you tested this, did the PIP service downloader give you any issues? As I recall it's set to default to admin only (it won't download the postalcode databases):

I guess I've been using the whosonfirst downloader directly which by default downloads everything 😀 https://github.com/pelias/whosonfirst/blob/22f7ad76dbf65bf25f0f4a2af352ad45b72a2a94/bin/download

@octmoraru
Copy link
Author

Hi @orangejulius - what else would you like to see in this PR to have it merged? 🙂

@missinglink
Copy link
Member

Have you tested that this works as expected? I believe that this alone will not fix any issues in Pelias results since the PIP service is responsible for setting these fields but not this one, which is used for search and display.

@orangejulius
Copy link
Member

Ah, yeah so as @missinglink points out there are a couple objectives that one might wish to achieve:

Adding postal codes to address or other records

Having postal code geometries available via the PIP service is a prerequisite for this, but not enough to solve this issue on its own. Additional work would be required to make sure that, at import time, postal codes are added to the appropriate fields on each individual address record.

Allowing coarse reverse geocoding to return postalcodes

This would let one look up the postalcode for a given lat/lon, assuming the postalcode geometry is trustworthy. It sounds like @octmoraru says this works after this PR.

In my estimation, that's the less common of the two needs, but it's better than nothing. 12.6GB of RAM is quite a bit, so it might not be a good idea to enable this feature in many cases. I'm on the fence if supporting something that will be a non-recommended configuration is worth it.

@octmoraru
Copy link
Author

Indeed, my primary intent for this PR was to make postalcodes available in /v1/reverse?layers=coarse(or ?layers=postalcode) responses. I realise now that I need to change this array to include postalcode in order for it to actually work.

Additional work would be required to make sure that, at import time, postal codes are added to the appropriate fields on each individual address record.

True, @orangejulius @missinglink - how would you feel if I were to duplicate parent.postalcode into address_parts.zip, somewhere around here? I think that should do the trick for all importers, right?

@missinglink
Copy link
Member

missinglink commented Mar 18, 2022

I feel like the topic of whether algorithmically assigned postcodes are considered equal to authoritative postcodes is subjective, regional and domain-specific.

Some installations may prefer the existing behavior where we only display authoritative postcodes, some may prefer to have increased coverage at the cost of the potential inaccuracies.

For that reason I would like to put it behind a pelias/config 'flag' which your org can opt-in to immediately but others can elect to opt in or not, and punt the decision of which is the default for later, once we've evaluated the international quality implications.

Another option which we've discussed in the past is to introduce a new field called postcode_quality (or whatever we name it) which indicates that these fields are populated algorithmically from polygons rather than supplied in the source data.

This second option would allow us to make this the default immediately but indicate to the consumer a criteria which they could filter on after parsing the geojson result.

@octmoraru
Copy link
Author

Thanks for making it clearer for me! So out of these two options:

For that reason I would like to put it behind a pelias/config 'flag' which your org can opt-in to immediately but others can elect to opt in or not, and punt the decision of which is the default for later, once we've evaluated the international quality implications.

and

Another option which we've discussed in the past is to introduce a new field called postcode_quality (or whatever we name it) which indicates that these fields are populated algorithmically from polygons rather than supplied in the source data.

which one would you prefer? I can probably work at implementing either one of them in the near future.

@missinglink
Copy link
Member

missinglink commented Mar 21, 2022

I think the two solutions aren't really different, more that the latter is an extension of the former.

So at minimum:

  1. User can choose whether postcodes are downloaded and ingested into the PIP server.
  2. A config option is added to enable/disable setting address_parts.zip from PIP results (obviously this would be a no-op if 1. didn't import anything), the default value is false.

In this situation you & others can elect to opt-in to this feature in their installation of Pelias, for testing, production, or whatever by setting the value to true.

Then we live with it for a few months, hopefully you provide some feedback about how well it worked out for you and we evaluate how good/bad it is.

Then later this year we approach the idea of making it the default setting.

  1. At this stage the default would change from false to true and at the same time we
  2. introduce something in the response which indicates whether the postcode was authoritative or algorithmically assigned.

I think you should focus on the first two steps since it solves your business need and also is easy to merge since it will be a no-op for everyone else (for now).

Copy link
Member

@Joxit Joxit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this PR for its simplicity and backward compatibility 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants