Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer runs out of memory when importing venues #180

Closed
orangejulius opened this issue Dec 9, 2016 · 1 comment
Closed

Importer runs out of memory when importing venues #180

orangejulius opened this issue Dec 9, 2016 · 1 comment
Labels
Milestone

Comments

@orangejulius
Copy link
Member

orangejulius commented Dec 9, 2016

This looks to somehow have been caused by #163, which unfortunately gave a nice speed boost. Hopefully we can identify the problem and keep the speed improvements.

It's been reported that raising the Node.js memory limit with something like node --max_old_space_size=16384 import.js can help, at least for single bundles.

This blocks full production readiness for WOF Venues (#94)

@orangejulius
Copy link
Member Author

This appears to have been fixed in 526f664.

Unfortunately, the csv-stream CSV parsing module that we switched to, runs a little slower than newer parsers, but it also doesn't run out of memory.

After much investigation it seems like newer modules, which use Streams versions 2 or 3, will basically load an entire CSV file in to memory when a downstream stream has an async request built into it (even if that async request is limited by something like pelias-parallel-stream. Our stream to load WOF JSON uses async filesystem read calls, so it causes this behavior.

As far as I can tell, because csv-stream is very old and based on the Stream1 interface, which instead only fetches more data when "asked" to by the next stream in the pipeline, it doesn't cause the memory leak. Its a little slower to use Stream1 than Stream2, because CSV data can be more efficiently parsed in batches, but it's even slower to use the syncronous filesystem calls when loading JSON. Since we were able to put in a fix that allows csv-stream to work with Node.js 8 (and even 9!), it seems like we're in a decent place for now.

@ghost ghost removed the processed label Nov 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant