-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running outlier_detection
with in_memory=False
reads all models into memory
#8478
Comments
I will add here that when
So it is writing to disk when it should be doing everything in memory. This is with release 1.14.0. |
Comment by Ned Molter on JIRA: This should be fixed as part of implementing ModelLibrary (see JP-3690). Here is a small summary of results from profiling memory for this step on its own in my PR branch, using as input an association containing a 46-cal-file subset of the data in ███████████████████████████████████████████
The total size of the input _cal files on disk is roughly 5 GiB. Setting in_memory=True, the peak memory usage is 31.3 GiB, and I'm attaching a graph of usage over time to the ticket (outlier_in_memory.png, sorry for the similar name to what Brett attached originally). Setting in=memory=False, the peak memory usage is 7.1 GiB, and a graph is again attached (outlier_on_disk.png). So this appears to be fixed: although additional memory improvements to this step are certainly possible and probably still necessary, it is clear that the input models are never being loaded into memory all at once. |
Comment by Melanie Clarke on JIRA: Fixed by #8683 |
Issue JP-3619 was created on JIRA by Brett Graham:
Running outlier detection with the recent main commit: 13e0927
With an association containing 33 members (with no custom grouping or tweakreg_catalog entries for any member).
And with "in_memory=False" results in the following memory usage:
The attached graph shows that even with "in_memory=False" all models are read into memory (the largest plateau in the graph). The main cause is that the implementation of ModelContainer.models_grouped (which is used in resample) opens all models and keeps them in memory.
The text was updated successfully, but these errors were encountered: