Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running skymatch with minimize_memory opens 2x files for each input association member #8479

Closed
stscijgbot-jp opened this issue May 10, 2024 · 4 comments · Fixed by #8683
Closed

Comments

@stscijgbot-jp
Copy link
Collaborator

Issue JP-3620 was created on JIRA by Brett Graham:

When skymatch is run with minimize_memory it will write out and keep open 2x temporary files for every member of the input association. For large associations (hundreds of members) this can lead to failed runs where the step exceeds the allowed number of open files (using the default value on some systems).

See relevant code here:

self._tmp = tempfile.NamedTemporaryFile(

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Ned Molter on JIRA:

This should be fixed as part of implementing ModelLibrary (see JP-3690).  I will post memory profiling for resample here as well as on that ticket as I continue to develop and test the changes.

@stscijgbot-jp
Copy link
Collaborator Author

stscijgbot-jp commented Aug 12, 2024

Comment by Ned Molter on JIRA:

Here is a small summary of results from profiling memory for this step on its own in my PR branch, using as input an association containing a 46-cal-file subset of the data in 

███████████████████████████████████████████ 

The total size of the input _cal files on disk is roughly 5 GiB.

Setting in_memory=True, the peak memory usage is 5.5 GiB, and I'm attaching a graph of usage over time to the ticket (skymatch_in_memory.png).

Setting in=memory=False, the peak memory usage is 1.5 GiB, and a graph is again attached (skymatch_on_disk.png).

So it appears that the switchover to ModelLibrary comes with substantial memory savings in this step, at least for my test dataset.

This memory profiling doesn't directly address the issue in the ticket, but the changes are related:  A new input parameter "in_memory" (so named for consistency with resample and outlier_detection) is added, and the hidden (i.e., not in the spec) keyword argument "minimize_memory" is removed. The reduce_memory_usage parameter to SkyImage is then hardcoded to False when instantiating SkyImage objects within the SkyMatchStep; this is consistent with what was done for Roman.

This change resolves the ticket, since no attempt is made to write these temporary files at all, although it might be worth considering whether the reduce_memory_usage option and associated "if" statements in the SkyImage class should be removed entirely - Brett Graham was this discussed in the context of the Roman changes? 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Thanks for the profiling and looking into the "minimize_memory" usage.

As you noted romancal disables this option (and doesn't provide a way to enable it). I think removing the if statements (etc) in SkyImage makes sense and likely would fit well into a PR that moves this class to stcal (so a separate ticket and PR).

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

Fixed by #8683

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant