Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate image explanations in search/classify workflow #2343

Closed
cdbethune opened this issue Mar 5, 2021 · 3 comments · Fixed by #2384
Closed

Validate image explanations in search/classify workflow #2343

cdbethune opened this issue Mar 5, 2021 · 3 comments · Fixed by #2384
Assignees

Comments

@cdbethune
Copy link
Collaborator

Image explanations haven't been exercised recently. We should verify that they a) still work in the base unfeaturized case, and b) the unpooled/pre-featurized case. The latter would allow for us to include explanations in the search/classify workflow that we have been demonstrating.

@cdbethune cdbethune self-assigned this Mar 5, 2021
@cdbethune
Copy link
Collaborator Author

cdbethune commented Mar 17, 2021

  1. Testing against big earth single produces an unpooled prefeaturized dataset of 10K rows x 32K images. The parquet loading library we use seems to be aggressively chewing through RAM, and probably leaking some in the process of loading. A dataset that is about 5GB ends up occupying around 15GB of RAM after being loaded. It is also much slower than the parquet reader that is part of PANDAS. It also seems like we end up duplicating the data when splitting, and possibly again before search. All of these factors combine to cause the big earth single model creation process to fail, with the server running out of RAM before the search can be started. Parquet read memory optimization #2382 fixed one of the duplications, but that's not sufficient to get everything running.
  2. Image explanations are not being generated in for the un-pooled pre-featurized case. This needs investigation.

@cdbethune
Copy link
Collaborator Author

For item 2 above, we were specifically blocking explanations on pre-featurized data. #2384 addresses this.

@cdbethune
Copy link
Collaborator Author

#2385 has been opened to track issues observed with parquet loading, and #2386 covers out of memory errors encountered with larger datasets that are unpooled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant