Running `outlier_detection` with `in_memory=False` reads all models into memory #8478

stscijgbot-jp · 2024-05-10T19:53:07Z

Issue JP-3619 was created on JIRA by Brett Graham:

Running outlier detection with the recent main commit: 13e0927

With an association containing 33 members (with no custom grouping or tweakreg_catalog entries for any member).

And with "in_memory=False" results in the following memory usage:

The attached graph shows that even with "in_memory=False" all models are read into memory (the largest plateau in the graph). The main cause is that the implementation of ModelContainer.models_grouped (which is used in resample) opens all models and keeps them in memory.

The text was updated successfully, but these errors were encountered:

jdavies-st · 2024-05-15T14:08:30Z

I will add here that when in_memory=True for outlier_detection, it still writes out the blot files and then reads them back in.

2024-05-15 15:48:21,169 - stpipe.Image3Pipeline.outlier_detection - INFO - Blotting (2048, 2048) <-- (6321, 9320)
2024-05-15 15:48:21,455 - stpipe.Image3Pipeline.outlier_detection - INFO - Saved model in jw02321001009_02101_00006_nrcblong_c1001_blot.fits

So it is writing to disk when it should be doing everything in memory. This is with release 1.14.0.

stscijgbot-jp · 2024-08-02T14:38:11Z

Comment by Ned Molter on JIRA:

This should be fixed as part of implementing ModelLibrary (see JP-3690).

Here is a small summary of results from profiling memory for this step on its own in my PR branch, using as input an association containing a 46-cal-file subset of the data in

███████████████████████████████████████████

The total size of the input _cal files on disk is roughly 5 GiB.

Setting in_memory=True, the peak memory usage is 31.3 GiB, and I'm attaching a graph of usage over time to the ticket (outlier_in_memory.png, sorry for the similar name to what Brett attached originally).

Setting in=memory=False, the peak memory usage is 7.1 GiB, and a graph is again attached (outlier_on_disk.png).

So this appears to be fixed: although additional memory improvements to this step are certainly possible and probably still necessary, it is clear that the input models are never being loaded into memory all at once.

stscijgbot-jp · 2024-09-20T14:33:15Z

Comment by Melanie Clarke on JIRA:

Fixed by #8683

stscijgbot-jp added the Software Affected: outlier_detection label May 10, 2024

emolter mentioned this issue May 15, 2024

JP-3584: Use rolling window median for TSO outlier detection #8473

Merged

8 tasks

emolter mentioned this issue Aug 12, 2024

JP-3690: Switch from ModelContainer to ModelLibrary for image3 pipeline #8683

Merged

8 tasks

tapastro closed this as completed in #8683 Sep 5, 2024

stscijgbot-jp added the performance-improvements label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running `outlier_detection` with `in_memory=False` reads all models into memory #8478

Running `outlier_detection` with `in_memory=False` reads all models into memory #8478

stscijgbot-jp commented May 10, 2024

jdavies-st commented May 15, 2024

stscijgbot-jp commented Aug 2, 2024 •

edited

Loading

stscijgbot-jp commented Sep 20, 2024

Running outlier_detection with in_memory=False reads all models into memory #8478

Running outlier_detection with in_memory=False reads all models into memory #8478

Comments

stscijgbot-jp commented May 10, 2024

jdavies-st commented May 15, 2024

stscijgbot-jp commented Aug 2, 2024 • edited Loading

stscijgbot-jp commented Sep 20, 2024

Running `outlier_detection` with `in_memory=False` reads all models into memory #8478

Running `outlier_detection` with `in_memory=False` reads all models into memory #8478

stscijgbot-jp commented Aug 2, 2024 •

edited

Loading