-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/geoparquet storage #204
base: master
Are you sure you want to change the base?
Conversation
tgrandje
commented
Aug 6, 2024
- Should handle the geoparquet format for geodataframe
- Replace multiprocessing.Pool by pebble.ThreadPool
* Switch the storing/loading of geoparquets from pandas to geopandas * black formatting
* switch from multiprocessing.Pool to pebble.ThreadPool * black formatting
2 tests failed on my machine:
Second one might be linked to a proxy configuration (there is a requests Session built on the fly among the test): let's see what is happening on github. First failure shouldn't be linked to this PR (I'll wait for the results on github before further inspection if I can) |
Ok, I think I've fixed the failing tests on my machine. I'll also try a patch to By the way, I seem to recall a discussion validating the testing of only lower and upper python version. Wasn't it ever implemented? Should I do that at the same time? |
Am I right to assume the tests are failing because they are checked against the workflows yml from the master branch? Any idea on how to test the impact of requests-cache on multiple tests? Maybe a temporary adding to the requirements-extra.txt file if that's ok with you? (On my machine, without clearing the cache : first run takes ~30min. Next one: ~5min. The sqlite cache is 1.3 Go though, so not cached as an artifact right now.) |
Yes, please! (3.8 and 3.12)
Yes, I don't think there is any way to avoid this since otherwise any random malicious PR could leak secrets. Regarding the geoparquet issue, do I understand correctly that the only reason this fails is because of the |
Yes, I got carried away... (Started this as resolving a pure geodataframe/arrow issue, then bunked into the mutiprocessing issue, then to 6 hours tests...). I'll revert the caching and python test version into another PR. |
No, there was a unresolved bug still on my first commits. Once I remove the caching and workflow alterations, this should work fine. |
I meant the original issue of not being able to write the parquet file |
Oh, sorry :-) So yes, I suppose that's one way of thinking: if the GeoFrDataFrame inherits from pd.DataFrame and not gpd.GeoDataFrame, then the .write_parquet method is not the good one. |
I'm not a fan of that compromise as it adds complexity to a code that, as we've seen, is already quite elaborate. |
I agree with you, but I'm also in favor of a quick release of the bugfix (it is heavily slowing some algorithms of mine by downloading 100Mo dataset over and over...). |