Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to call the number of scans from the MS to the outputvis? #18

Open
Victoria-Samboco opened this issue Nov 21, 2022 · 21 comments
Open

Comments

@Victoria-Samboco
Copy link
Contributor

I'm trying to split the MS using the MS and not a list of scans as I did before, but I'm having an error with the outputvis.

solarkat1:
      info: " A loop to split/extract all scans from the Measurement Set and determine the sun coordinates (RA/DEC)"
      params: 
        ms: '{recipe.ms}'
      recipe:
        inputs:
          ms:
            dtype: MS
            info: "splitms by scans"
        for_loop:
          var: myscan
          over: ms
        steps:
          split_my_ms:
            cab: splitms
            params:
              ms: '1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms' 
              scan: '{recipe.myscan}'
              datacolumn: 'all'
              outputvis: "scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_{recipe.myscan}.ms" 

I want it to return as outputvis something like outputvis = scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_3.ms , ... , but it is returning

split(vis="1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms",outputvis="scansuhf/perscan_ms/15836
2427_sdp_l0.1024ch-J033230_280757-corr_self-scan_1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms
.ms",keepmms=True,field="",spw="",
# 2022-11-21 18:37:55      INFO    split::::+
scan="1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms",antenna="",correlation="",timerange="",in
ent="",
# 2022-11-21 18:37:55      INFO    split::::+
array="",uvrange="",observation="",feed="",datacolumn="all",  
# 2022-11-21 18:37:55      INFO    split::::+
keepflags=True,width=1,timebin="0s",combine="")
# 2022-11-21 18:37:55      INFO    MSTransformManager::parseMsSpecParams   Input file name is
1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms
# 2022-11-21 18:37:55      INFO    MSTransformManager::parseMsSpecParams   Data column is ALL
# 2022-11-21 18:37:55      INFO    MSTransformManager::parseMsSpecParams   Output file name is
scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_1583662427_sdp_l0.1024ch-J
33230_280757-corr_self.ms.ms
# 2022-11-21 18:37:55      INFO    MSTransformManager::parseDataSelParams  scan selection is
1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms
# 2022-11-21 18:37:55      INFO    MSTransformManager::colCheckInfo        Adding DATA column to
output MS from input DATA column
# 2022-11-21 18:37:55      INFO    MSTransformManager::colCheckInfo        Adding CORRECTED_DATA
column to output MS from input CORRECTED_DATA column
# 2022-11-21 18:37:55      INFO    MSTransformManager::colCheckInfo        Adding MODEL_DATA
column to output MS from input MODEL_DATA column
# 2022-11-21 18:37:55      SEVERE  split::::       Scan Expression: Parse error at or near '_'
# 2022-11-21 18:37:55      SEVERE  split::::+      (near char. 11 in string
"1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms")
# 2022-11-21 18:37:55      INFO    split::::       ##### End Task: split                #####
# 2022-11-21 18:37:55      INFO    split::::+      ##########################################
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms INFO: /usr/bin/casa exited with code 0 after
0:00:04
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR: error validating outputs:
error validating outputs:
└── 'boom.solarkat1.split_my_ms.outputvis':
    scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_1583662427_sdp_l0.1024
    ch-J033230_280757-corr_self.ms.ms doesn't exist
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR: ### failed outputs
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR: cab split_my_ms:
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR:   ms =                             
1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR:   datacolumn = all
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR:   scan =                           
1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR:   outputvis =                      
scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_1583662427_sdp_l0.1024ch-J
33230_280757-corr_self.ms.ms 

How can I call the number of scans from the MS without creating a list of it in the recipe?

@Kincaidr
Copy link
Collaborator

Kincaidr commented Nov 21, 2022

Problem is here: scan: '{recipe.myscan}' The scan in the casa split task should be str. Here you are calling scan ='{recipe.myscan}' which is a for-loop over ms:

for_loop:
    var: myscan
    over: ms

You should set it to ' ' for all scans:

   scan -- Scan number range  
        default: ’’ = all

Glad you brought this up because splitting scan by all is not working. for me as expected. If I set to ' ' I am only given 1 MS with all the scan numbers:

scans = list(numpy.unique(t.getcol('SCAN_NUMBER')))=[3, 5, 7, 9, 14, 16]

I recall this working previously when I tried it a while back in the pipeline, so not sure whats gone wrong.

@Victoria-Samboco
Copy link
Contributor Author

Victoria-Samboco commented Nov 22, 2022 via email

@Victoria-Samboco
Copy link
Contributor Author

This one.

def split_ms(ms): #Simple split
    tb.open(ms)
    scans=sorted(set(tb.getcol('SCAN_NUMBER')))
    tb.done()
    for scan in scans:
        scan = str(scan)
        SPLITED_ms=ms.replace('.ms','_scan'+scan+'.ms')
        #split(vis=ms,outputvis='/home/samboco/solarkat/1GC_UHF/msdir/scans/'+SPLITED_ms,scan=scan, datacolumn='all')
        split(vis=ms,outputvis='scansuhf/perscan_ms/'+SPLITED_ms,scan=scan, datacolumn='all')
        print(SPLITED_ms + 'DONE')

@Kincaidr
Copy link
Collaborator

yes we can do it this way, but I still want to know why casa cannot do it by setting scan to all

@IanHeywood
Copy link
Collaborator

scan = 'all' doesn't work with CASA's split task, you just need to set scan = '' to split all scans.

@Kincaidr
Copy link
Collaborator

scan = 'all' doesn't work with CASA's split task, you just need to set scan = '' to split all scans.

Yes I have done this, but the output I get is only 1 MS.

@IanHeywood
Copy link
Collaborator

Oh I see! If you want a per-scan MS then you have to run split once per scan, e.g.

scans = [33, 6, 10, 14, 18, 29]
for scan in scans:
    scan = str(scan)
    opms = "1658342033_sdp_l0_GPM1839-10_polcal.ms".replace(".ms","_scan"+scan+".ms")
    split(vis="1658342033_sdp_l0_GPM1839-10_polcal.ms",outputvis=opms,scan=scan,datacolumn="all")

@Victoria-Samboco
Copy link
Contributor Author

Oh I see! If you want a per-scan MS then you have to run split once per scan, e.g.

scans = [33, 6, 10, 14, 18, 29]
for scan in scans:
    scan = str(scan)
    opms = "1658342033_sdp_l0_GPM1839-10_polcal.ms".replace(".ms","_scan"+scan+".ms")
    split(vis="1658342033_sdp_l0_GPM1839-10_polcal.ms",outputvis=opms,scan=scan,datacolumn="all")

Yes, that's what I did here:

splitms_by_scan: #Its working well
      info: " A loop to split/extract all scans from the Measurement Set and determine the sun coordinates (RA/DEC)"
      recipe:
        inputs:
          myscans:
            dtype: List[str]
            info: "splitms by scans"
            default: ["3","5","7","9","11","13","15","17","19","21","25","27","29","31","33","35","37","39","41","43","47","49"]
        for_loop:
          var: myscan
          over: myscans
        steps:
          split_my_ms: 
            cab: splitms
            params:
              ms: '1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms' 
              scan: '{recipe.myscan}'
              datacolumn: 'all'
              outputvis: "scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan{recipe.myscan}.ms" 

Looks like the only way it can work is giving a list of scans to the for loop...But that's mean that we will have to do it manualy...

@Kincaidr
Copy link
Collaborator

Oh I see! If you want a per-scan MS then you have to run split once per scan, e.g.

scans = [33, 6, 10, 14, 18, 29]
for scan in scans:
    scan = str(scan)
    opms = "1658342033_sdp_l0_GPM1839-10_polcal.ms".replace(".ms","_scan"+scan+".ms")
    split(vis="1658342033_sdp_l0_GPM1839-10_polcal.ms",outputvis=opms,scan=scan,datacolumn="all")

Yes this is what we did this before. I thought that if we do scan = ' ' then casa would do this.

@Kincaidr
Copy link
Collaborator

Looks like the only way it can work is giving a list of scans to the for loop...But that's mean that we will have to do it manualy...

No it doesn't have to be manually. We can just do scans=sorted(set(tb.getcol('SCAN_NUMBER'))) like you showed before. Then in the casa split cab we set scans = ' '.

@Victoria-Samboco
Copy link
Contributor Author

But there's an issue with tb

Looks like the only way it can work is giving a list of scans to the for loop...But that's mean that we will have to do it manualy...

No it doesn't have to be manually. We can just do scans=sorted(set(tb.getcol('SCAN_NUMBER'))) like you showed before. Then in the casa split cab we set scans = ' '.

But there's an issue with tb in python, because it is a casa command. Do you know which module we can import to use tb from python?

@IanHeywood
Copy link
Collaborator

You can use

from casacore.tables import table
tt = table(ms)
scans = sorted(set(tt.getcol('SCAN_NUMBER')))
tt.done()

@Kincaidr
Copy link
Collaborator

You can use

from casacore.tables import table
tt = table(ms)
scans = sorted(set(tt.getcol('SCAN_NUMBER')))
tt.done()

But when using the split command, it wont be recognized in python.

Oleg has added the flavour functionality. So something like this might work:

command: |
      tb.open(ms)
      scans=sorted(set(tb.getcol('SCAN_NUMBER')))
      tb.done()
      for scan in scans:
          scan = str(scan)
          SPLITED_ms=ms.replace('.ms','_scan'+scan+'.ms')
          split(vis=ms,outputvis='scansuhf/perscan_ms/'+SPLITED_ms,scan=scan, datacolumn='all')
          print(SPLITED_ms + 'DONE')
flavour: casa-task
inputs:
  vis:
   dtype: MS
  scan:
    dtype: str
 outputs:
  outputvis:
    dtype: MS
 params:

@Victoria-Samboco
Copy link
Contributor Author

yes, it says that split is not defined

2022-11-22 13:00:02 STIMELA.boom.split_by_scan ERROR: NameError: name 'split' is not defined
2022-11-22 13:00:02 STIMELA.boom.split_by_scan ERROR: find_sun_stimela.split_ms1() threw        
exception: name 'split' is not defined'
2022-11-22 13:00:02 STIMELA.boom INFO: saving full profiling stats to
./obsuhf/logs/log-/stimela.stats

@IanHeywood
Copy link
Collaborator

I'm a bit lost here with the intracies of stimela. If you are in a situation where you can call the CASA split task then why can't you use the tb tool to loop over the scan numbers?

@IanHeywood
Copy link
Collaborator

For my own pipeline processing the first step is to run a script that pulls all the relevant metadata out of the MS and writes it to files. Subsequent processing steps can then just consult the files when they need to know things like relevant scan numbers. Maybe something like that would be easier than trying to repeatedly consult the MS as the processing progresses...

@Kincaidr
Copy link
Collaborator

Ok I see the problem. If we using flavour: casa-task for splitwe have to say command: split as such:

 command: split
      flavour: casa-task
      inputs:
        vis: 
          dtype: MS
          required: true
          default: '{recipe.ms}'
        scan:
          dtype: str
        datacolumn:
          dtype: str 
          default: 'corrected_data'
        outputvis:
          dtype: str
          default: '{recipe.ms}-output'
          required: true 

So before we call this cab we have to do the table stuff in python and then call the cab. So have 2 steps, one for each:

steps:
  scan_numbers: %step1
    cab:
      command: |
          tb.open(ms)
          scans=sorted(set(tb.getcol('SCAN_NUMBER')))
          tb.done()
          for scan in scans:
              scan = str(scan)
              SPLITED_ms=ms.replace('.ms','_scan'+scan+'.ms')
      flavour: python-code
      inputs:
        ms
      outputs:
        scan
      splitted_ms_name:
        SPLITED_ms:
   splitms_scan: %step2
      command: split
      flavour: casa-task
      inputs:
        vis: 
          dtype: MS
          required: true
          default: '{recipe.ms}'
        scan:
          dtype: str
        datacolumn:
          dtype: str 
          default: 'corrected_data'
        outputvis:
          dtype: str
          default: '{recipe.ms}-output'
          required: true 
       params:
         vis: '{recipe.ms}'
         scan: =steps.scan_numbers.scan

@Kincaidr
Copy link
Collaborator

I'm a bit lost here with the intracies of stimela. If you are in a situation where you can call the CASA split task then why can't you use the tb tool to loop over the scan numbers?

Because as for as I know the casa commands and python ones are separate. Currently, in stimela, we can do purely only casa or python but not both.

@IanHeywood
Copy link
Collaborator

Then shouldn't tb work if it's in CASA mode?

@landmanbester
Copy link
Collaborator

Just FYI. It may be easier to do this with dask-ms convert (latest release) since it now supports a taql-where argument eg.

$ dask-ms convert path_to_data.ms -g "FIELD_ID,DATA_DESC_ID,SCAN_NUMBER" -o output.ms --format casa --force --taql-where "SCAN_NUMBER==1"

will split out scan 1 into an MS called output.ms

@Kincaidr
Copy link
Collaborator

Then shouldn't tb work if it's in CASA mode?

Im trying this now, but getting the same as @Victoria-Samboco . Its not reading tb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants