-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddcal worker fails due to a MemoryError #1582
Comments
Hi @a-benati , thanks for reporting this. Can you please share the full log? Best regards |
Hi @Athanaseus, thanks for your answer. Here is the full log: Setting |
Thanks @a-benati , |
I believe that the issue is the absence of time and frequency chunks in the input parameters. You will be be working with extremely large chunks. I would suggest setting the input time and frequency chunks to match the solution interval on your DDE in this case. |
Thanks @Athanaseus. Here is the log result of |
@JSKenyon thanks for your answer. I agree with the fact that I need smaller time and frequency chunks, but I am not sure about the parameters to change: are they |
These is where the options are set in the ddcal worker: caracal/caracal/workers/ddcal_worker.py Lines 330 to 331 in 2d338e2
I am not much of a CaraCal user so I am not sure of the easiest way to adjust those parameters. |
In principle, for the parameters in the log you shared, |
Thanks @JSKenyon. When |
You could definitely give it a try and see if it resolves the issue. I see that you have 24 directions in your model - that is pretty extreme (your model will be 24 times larger than the associated visibilities). I would also suggest making your frequency solution interval something which divides 512 (the number of channels if I am not mistaken) e.g. 128. That should prevent some complications. Unfortunately, CubiCal (the underlying software package for the ddcal step) was never particularly light on memory. |
@JSKenyon thanks, I will try setting |
@Athanaseus, @JSKenyon thanks. I think I solved that error since the code gets through the part where it was stuck before. However, now I get another error, which I believe is related to flagging in DDFacet (I think all data are flagged). Here is the log file: |
It looks like the data has been almost completely flagged, possibly by CubiCal. You should probably check your flagging before and after that step. CubiCal is also very unhappy about the SNR in many of the directions. I would suggest looking at your image prior to DD calibration to make sure that all 24 of those directions really require DD solutions. |
@JSKenyon thanks. I reduced the number of facets to 12, but I don't think I really need this many directions, I only have 3 or 4 very bright sources in my field which corrupt everything else. Do you think that reducing the number of facets to 4 or 6 could solve the issue of the flagging? Or is it a completely independent problem? |
Unfortunately I did not implement the DDFacet component of the visibility prediction, so I am not an expert. I think that you would likely need to edit the region file passed to CubiCal such that it only includes the 4 or so problematic sources. |
Thanks @JSKenyon. Do you know which is the region file passed to CubiCal with the 4 sources or where is it created? I can edit it and tell CubiCal to use that file instead of automatically creating a new one, right? |
Based on your log, it is Pinging @bennahugo as he is more knowledgeable about this functionality than I am. |
Yup you may need to increase the local sigma thresholding to the autotagger
if you want to use it -- alternatively manually create a pixel-coordinate
region file for your target with astropy / ds9 to pass into cubical per
@JSKenyon 's suggestion.
The number of facets has no traction on the memory footprint though -- only
the number of directions you marked in the region file. I do agree that 12
tags are on the excessive end.
…On Wed, May 15, 2024 at 2:25 PM JSKenyon ***@***.***> wrote:
Thanks @JSKenyon <https://github.com/JSKenyon>. Do you know which is the
region file passed to CubiCal with the 4 sources or where is it created? I
can edit it and tell CubiCal to use that file instead of automatically
creating a new one, right?
Based on your log, it is /stimela_mount/output/de-Abell3667.reg. It is
created by CatDagger in the previous step. I would suggest manually
creating your own region file using Carta or DS9. You could then modify the
model option in the CubiCal step to use your region file (note that it
appears twice in the specification of the model).
Pinging @bennahugo <https://github.com/bennahugo> as he is more
knowledgeable about this functionality than I am.
—
Reply to this email directly, view it on GitHub
<#1582 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4RE6XNBHHOTSLBIJHNASLZCNH5FAVCNFSM6AAAAABHRXT3K2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJSGM4DQMBTGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Benjamin Hugo
|
@JSKenyon @bennahugo thank you. I will try by creating manually a region file with ds9 and telling CubiCal to use that file instead of the one created by CatDagger. I will let you know if it works. |
@JSKenyon @bennahugo I created the region file manually with carta and I gave it as input in CubiCal, but I still get the same error related to the flagged data. I actually think that the code stops at a previous step since in the log file the point where the region file is read is not even reached. For example, before the file caracaldE_sub.log was created, but now it is not. Here is my log file. |
Can you please check the status of the flagging on the original data, prior to the pipeline being run? I don't think that the pipeline is resetting the flags to their original state i.e. now that your data is 100% flagged, it will remain that way. |
@JSKenyon yes, my data now is 100% flagged even prior to the run of the pipeline. Do you know how could I reset the flagging? I am running caracal starting directly from the ddcal worker, maybe I need to start over from the beginning to get it right? And in that case, giving the right list of tagged sources to CubiCal should solve the error with the flagging right? |
Sorry to jump in, but CARACal does support flagging resetting and rewinding in a number of ways. See https://caracal.readthedocs.io/en/latest/manual/reduction/flag/index.html . The ddcal worker might be the only one with no flagging rewinding option, but you could add a flag worker block to your config to just do the rewinding to whatever flag version you need. |
Thanks for jumping in! I am not really a CARACal expert so I appreciate it! |
@paoloserra thanks! I will definitely look into that, hoping that giving the manual region file to CubiCal solves the issue. |
@paoloserra I get an error saying that there aren't any flag versions for my ms file:
I attach here my log file. |
Hi @a-benati, You can also provide the name of the flag version like:
You can look up the flag versions in the flag table ( Note that the Best regards |
Hello,
the ddcal worker fails with:
MemoryError: Estimated memory usage exceeds allowed pecentage of system memory. Memory usage can be reduced by lowering the number of chunks, the dimensions of each chunk or the number of worker processes. This error can suppressed by setting --dist-safe to zero.
I don't understand which are the parameters to be modified in order to solve this problem. The number of worker processes (
dist_nworker
) is set to 0. I tried to modify thedata_chunkhours
parameter to 0.01 instead of the default 0.05 and nothing seems to be different.Here is the log file where the error is encountered:
I found the same problem in #1466, but trying to adjust the parameters
dd_g_timeslots_int
anddd_dd_timeslots_int
does not seem to improve the situation (I tried withdd_g_timeslots_int: 16
anddd_dd_timeslots_int: 16
and withdd_g_timeslots_int: 4
anddd_dd_timeslots_int: 4
).Do you know how can I solve this problem?
The text was updated successfully, but these errors were encountered: