Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port download code to use the batch endpoint #513

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Jan 6, 2023

this draft PR updates the OA downloader to use the batch.openaddresses.io endpoint.

a few things worth mentioning:

  • the API and downloads require an authentication token
  • each 'file' is now looked up against the OA API to get the latest 'run' (ie. latest version)
  • downloads are now gzip compressed rather than zip
  • the 'global downloads' remain unchanged, still pointing to the old domain, last time I looked there weren't equivalent files on the batch domain
  • I haven't done any testing of the import process, only the download process so far

@missinglink missinglink requested a review from Joxit January 6, 2023 16:38
utils/download_filtered.js Outdated Show resolved Hide resolved
utils/download_filtered.js Outdated Show resolved Hide resolved
@missinglink
Copy link
Member Author

missinglink commented Jan 10, 2023

This is ready for testing, there are some features I would have liked to add but will have to wait for a subsequent PR:

  • do not decompress after downloading, instead write the .gz and do the decompression on import (saves disk space)
  • source pattern matching, it would be ideal to add us/{ca,ny}/* to the config file and have the API figure it out

I still need to clarify a couple things with the OA team:

  • rate-limiting, the code enforces "current policy is 10 request per minute", is this still relevant and will there be an issue with the shared community key?
  • what support is there for the 'global downloads' (openaddr-collected-global.zip & openaddr-collected-global-sa.zip)?

@missinglink missinglink marked this pull request as ready for review January 10, 2023 12:28
@Joxit
Copy link
Member

Joxit commented Jan 11, 2023

LGTM

👍 for decompression on import and source pattern matching for a next PR.

Maybe you can add a logger.error for all missing files even if errorsFatal is false. This allows you to know all the errors at once.

const files = config.get('imports.openaddresses.files', []);
const sources = await getSources(files);
const validSources = sources.filter(source => source.url);
// respect 'imports.openaddresses.missingFilesAreFatal' setting
if (errorsFatal && (sources.length !== validSources.length)) {
callback(sources.find(source => source.error)); // return first error
return;
}

@missinglink
Copy link
Member Author

Yeah good idea about the logging, I'll go through and make sure everything is logged, even if it's at the debug log level, I also want to log the output of these child process shell commands if anything goes wrong.

I have a chat going with the OA team at the moment to clarify some of the final questions:

  • rate limiting: one concurrency per-user is perfect for now, they are looking to migrate to Cloudflare+R2 so hopefully the bandwidth restrictions will be reduced/removed at some point soon.
  • global downloads: there is a 'collection' named collection-global.zip which contains everything, addresses, parcels etc. I'd prefer to have a new collection with only the addresses, and in .tar.gz rather than .zip but we will see what's possible.

@vberten
Copy link

vberten commented Jun 30, 2023

Hi @Joxit, @missinglink,

Any update on this? Is that planned to make this available in the pelias/docker repo?

Thanks!

michaelkirk added a commit to michaelkirk-pelias/openaddresses that referenced this pull request Jul 1, 2023
…70a414af043ef95f

See pelias#513 for a proper fix.

Really we can just omit this commit, there's no need for the latest
version, but it seems nice to try to keep all the pelias repos on the
same config release.
@reverload
Copy link

Hi @Joxit, @missinglink,

This looks like a great improvement—thank you for all the work on this so far! 😊 I wanted to check in and ask how things are going with this PR. If there’s anything I can do to help, I’d be more than happy to assist!

I’m currently trying to set up Pelias Docker for a Spain build, and I noticed that the countrywide data for Spain from OpenAddresses is only available on the new repositories. This update would be really helpful for my use case, so I’m happy to contribute or test anything you might need to move things forward.

Thanks again for all your efforts! Let me know if I can pitch in.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants