Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug missing ms directory but json present when ignore_missing=True,… #1381

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Fil8
Copy link
Collaborator

@Fil8 Fil8 commented Nov 12, 2021

… and check if label_in files of mstransform are where they should be

This is related to issue issue #1379.

I wanted the pipeline to include in advanced steps such as continuum image of line all dataids given in the config file. This was happening only if a folder with dataid'+'extension (even empty) was present in rawdatadir

Following the already existing logic of CARACal I modified the code so that now all dataids are always included (given that the needed .MS sub-file is found in rawdatadir. In order to do so one must set ignore_missing=True (as it was before).

I also added 2 checks:

  • in obsconf we now check if the .json file is there. And if it is not there CARACal kindly exits
  • in transform we now check if the input file is there. (the label_in or the ms original file)

… and check if label_in files of mstransform are where they should be
@ratt-priv-ci
Copy link
Collaborator

Can one of the admins verify this patch?

@paoloserra paoloserra linked an issue Nov 16, 2021 that may be closed by this pull request
@paoloserra
Copy link
Collaborator

@Fil8

I'm testing the case when a raw MS included in the input list getdata: dataid no longer exists in rawdatadir, but its obsinfo files do (from a previous caracal run) and getdata: ignore_missing: true.

I've found some inconsistencies.

1

If I run transform to split the targets from the raw MS files the worker exits with message:

ERROR: '['gps2.ms']' did not match any files under /home/pserra/Astro/vela2020/rawshort. Check your 'general: msdir/rawdatadir' and/or 'getdata: dataid/extension' settings, or set 'getdata: ignore_missing: true' [ConfigurationError]

In this case we should not suggest to set getdata: ignore_missing: true. It is already set that way, else the pipeline would have stopped earlier (in the getdata worker) before reaching the transform worker.

And anyway, setting getdata: ignore_missing: true does not help because, in this case, transform needs the raw MS in order to split the target. The missing raw MS cannot be ignored (unless you fall back to the logic of caracal master, which ignores the missing raw MS throughout the pipeline).

So maybe change the transform error message to something like:

ERROR: '['gps2.ms']' did not match any files under /home/pserra/Astro/vela2020/rawshort but these MS files are required for this worker to continue. Check your 'general: msdir/rawdatadir' and/or 'getdata: dataid/extension' settings.

2

Let's assume that the targets have already been split in a previous caracal run before deleting the raw MS from rawdatadir. I should now be able to run transform on those split MS files to, e.g., average the data in frequency. With your new logic caracal should continue, but actually it does not.

Also, the transform error message is the same as above, i.e., transform says that it cannot find the split MS in rawdatadir, although that file was never meant to live there. Indeed, it should have looked for it in msdir, where it does exist.

ERROR: '['gps2-T29R03C05-corr.ms']' did not match any files under /home/pserra/Astro/vela2020/rawshort. Check your 'general: msdir/rawdatadir' and/or 'getdata: dataid/extension' settings, or set 'getdata: ignore_missing: true' [ConfigurationError]

3

So at the moment we seem to be somewhere in between the old logic and the new logic.

The old logic is that as long as there is one existing input MS caracal behaves as if the non-existing MS's had never been included in getdata: dataid. It was simple, but the limitations you point out are significant.

The new logic should be that there could be no existing input MS at all. Each worker should check whether the MS files it needs exist. These files could be the raw ones or other files created in previous caracal runs. This check should be done for every worker independtly, looking for the required MS files where they're supposed to live.

Does this make sense?

@Fil8
Copy link
Collaborator Author

Fil8 commented Nov 26, 2021

so if I understand correctly I need to:

  • change the error message in mstransform
  • check in the correct directory. Which means: for the raw ms in rawdatadir, while for all other MSproducts in msdir. I thought I fixed that.

@paoloserra
Copy link
Collaborator

Yes I think so.

These changes might be needed in other workers, too. The selfcal and line workers can also operate on the raw data if label_in is an empty string. So in general all workers need to look in the right place for the files they need to work on -- either in rawdatadir or in msdir. Error messages should reflect that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OBSID.ms must be in msDir
5 participants