Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaQL: integer aggregate function columns have invalid datatype #265

Open
bennahugo opened this issue Apr 11, 2024 · 5 comments
Open

TaQL: integer aggregate function columns have invalid datatype #265

bennahugo opened this issue Apr 11, 2024 · 5 comments

Comments

@bennahugo
Copy link

Hi I'm putting together a demo of taql and running into a problem with simple groupby / aggregate functions. I'm importing the table system from casatools for casa version 6.5.6.22

Here is a simple example:

tt = tb.taql("select ANTENNA1,ANTENNA2,gcount() as samplecount, sqrt(sumsqr(UVW[:2])) as bllength from tart.ms WHERE ANTENNA1!=ANTENNA2 GROUPBY ANTENNA1,ANTENNA2")

executes, but fails when I try to do tt.getcol("ANTENNA1") or tt.getcol('samplecount') with

2024-04-11 19:25:37	SEVERE	getcol::samplecount	Exception Reported: Unknown casa DataType!

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[88], line 1
----> 1 tt.getcol('samplecount')

File /opt/venvcasa/lib/python3.8/site-packages/casatools/table.py:838, in table.getcol(self, columnname, startrow, nrow, rowincr)
    827 def getcol(self, columnname, startrow=int(0), nrow=int(-1), rowincr=int(1)):
    828     """The entire column (or part of it) is returned. Warning: it might be big!
    829     The functions can only be used if all arrays in the column have the
    830     same shape. That is guaranteed for columns containing scalars or fixed
   (...)
    836     shaped
    837     """
--> 838     return self._swigobj.getcol(columnname, startrow, nrow, rowincr)

File /opt/venvcasa/lib/python3.8/site-packages/casatools/__casac__/table.py:2154, in table.getcol(self, *args, **kwargs)
   2115 def getcol(self, *args, **kwargs):
   2116     """
   2117     getcol(self, _columnname, _startrow, _nrow, _rowincr) -> variant *
   2118 
   (...)
   2152 
   2153     """
-> 2154     return _table.table_getcol(self, *args, **kwargs)

RuntimeError: Unknown casa DataType!

and

2024-04-11 19:34:41	SEVERE	getcol::ANTENNA1	Exception Reported: Unknown casa DataType!

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[90], line 1
----> 1 tt.getcol("ANTENNA1")

File /opt/venvcasa/lib/python3.8/site-packages/casatools/table.py:838, in table.getcol(self, columnname, startrow, nrow, rowincr)
    827 def getcol(self, columnname, startrow=int(0), nrow=int(-1), rowincr=int(1)):
    828     """The entire column (or part of it) is returned. Warning: it might be big!
    829     The functions can only be used if all arrays in the column have the
    830     same shape. That is guaranteed for columns containing scalars or fixed
   (...)
    836     shaped
    837     """
--> 838     return self._swigobj.getcol(columnname, startrow, nrow, rowincr)

File /opt/venvcasa/lib/python3.8/site-packages/casatools/__casac__/table.py:2154, in table.getcol(self, *args, **kwargs)
   2115 def getcol(self, *args, **kwargs):
   2116     """
   2117     getcol(self, _columnname, _startrow, _nrow, _rowincr) -> variant *
   2118 
   (...)
   2152 
   2153     """
-> 2154     return _table.table_getcol(self, *args, **kwargs)

RuntimeError: Unknown casa DataType!

It succeeds with the floating point aggregate, or when I get a floating point value column

tt.getcol('bllength')
array([0.19455482, 0.51647411, 0.75153464, 0.3595906 , 0.86945279,
       1.03569452, 1.20868631, 1.46945354, 1.61672806, 1.95074426,
       2.13140339, 1.0461806 , 1.18919058, 1.70865012, 2.0106987 ,
       2.17048716, 2.48252164, 1.02388213, 1.36084806, 1.68034738,
       1.79704041, 1.9604013 , 2.17944748, 0.67646162, 0.89569323,
       0.53671945, 1.00977189, 0.8411397 , 1.01413148, 1.27489871,
       1.42217324, 1.75618943, 1.93684857, 1.17858796, 1.31766977,...

Funnily enough though when no aggregate functions are requested in the select the integer values return correctly
e.g.

tt = tb.taql("select ANTENNA1,ANTENNA2 as bllength from tart.ms WHERE ANTENNA1!=ANTENNA2 GROUPBY ANTENNA1,ANTENNA2")
tt.getcol("ANTENNA1") # or ANTENNA2

returns an int valued array

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,
        2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,
        3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,

Full environment:

casadata==2023.6.26
casafeather==0.0.18
casalogger==1.0.16
casampi==0.5.3
casaplotms==2.1.2
casaplotserver==1.6.1
casashell==6.5.6.22
casatablebrowser==0.0.32
casatasks==6.5.6.22
casatestutils==6.5.6.22
casatools==6.5.6.22
casaviewer==1.8.2
python-casacore==3.5.2

Under the python-casacore table import the taql statements work correctly. I'm not sure what inside the casa environment makes this not work - perhaps something to raise with the casa team members?

from pyrap.tables import table as tbl
from pyrap.tables import taql
tt = taql("select ANTENNA1,ANTENNA2,gcount() as samplecount, sqrt(sumsqr(UVW[:2])) as bllength from tart.ms WHERE ANTENNA1!=ANTENNA2 GROUPBY ANTENNA1,ANTENNA2")
tt.getcol("ANTENNA1")
tt.getcol("samplecount")

returns

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,
        2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,...

and

array([6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,

respectively as expected

@renaudmiel
Copy link

renaudmiel commented Apr 12, 2024

I also faced similar annoyances with CASA taql in the past.
I reproduced the issue with monolithic CASA 6.6.3.
The (ugly) workaround is to convert ANTENNA1 to INTEGER:

CASA <7>: qry = f'select ANTENNA1,ANTENNA2,gcount() as samplecount, sqrt(sumsqr(UVW[:2])) as bllength from {my_ms} WHERE ANTENNA1!=ANTENNA2 GROUPBY ANTENNA1,ANTENNA2'
CASA <11>: tb.open(my_ms)
Out[11]: True

CASA <12>: qry_res = tb.taql(qry)

CASA <15>: qry_res.getcol('ANTENNA1')
2024-04-12 01:49:31	SEVERE	getcol::ANTENNA1	Exception Reported: Unknown casa DataType!
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-15-36af47140870> in <cell line: 1>()
----> 1 qry_res.getcol('ANTENNA1')

/mnt/software/alma/casa_dist/casa/release/casa-6.6.3-22-py3.8.el8/lib/py/lib/python3.8/site-packages/casatools/table.py in getcol(self, columnname, startrow, nrow, rowincr)
    836         shaped
    837         """
--> 838         return self._swigobj.getcol(columnname, startrow, nrow, rowincr)
    839 
    840     def getvarcol(self, columnname, startrow=int(0), nrow=int(-1), rowincr=int(1)):

/mnt/software/alma/casa_dist/casa/release/casa-6.6.3-22-py3.8.el8/lib/py/lib/python3.8/site-packages/casatools/__casac__/table.py in getcol(self, *args, **kwargs)
   2152 
   2153         """
-> 2154         return _table.table_getcol(self, *args, **kwargs)
   2155 
   2156 

RuntimeError: Unknown casa DataType!

CASA <20>: qry = f'select ANTENNA1 as ANT1 INTEGER, ANTENNA2,gcount() as samplecount, sqrt(sumsqr(UVW[:2])) as bllength from {my_ms} WHERE ANTENNA1!=ANTENNA2 GROUPBY ANTENNA1,ANTENNA2'

CASA <21>: qry_res = tb.taql(qry)

CASA <22>: qry_res.colnames
Out[22]: ['ANT1', 'ANTENNA2', 'samplecount', 'bllength']

CASA <23>: qry_res.getcol('ANT1')
Out[23]: 
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,
        2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,
        2,  2,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,
        3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  4,  4,  4,  4,  4,
        4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,
        4,  4,  4,  4,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,
        5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  6,  6,  6,  6,  6,
        6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,
        6,  6,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,
        7,  7,  7,  7,  7,  7,  7,  7,  8,  8,  8,  8,  8,  8,  8,  8,  8,
        8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  9,  9,  9,  9,
        9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,
       10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
       10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
       11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
       12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
       13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14,
       14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
       15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
       17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18,
       18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19,
       19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21,
       21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23,
       23, 23, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 26, 26, 26, 26,
       27, 27, 27, 28, 28, 29])

@bennahugo
Copy link
Author

Thanks a lot for pointing out a workaround @renaudmiel. I'm going to leave this up for now for documentation purposes, because I do think it is a bug with the casa wrappers around the casacore table system - the taql behaviour should be consistent between the two python wrappers

@tammojan
Copy link
Contributor

Too bad that there's a discrepancy. I'm afraid this is indeed for the CASA team, though in the end it might boil down to something in the TaQL system itself.

@bennahugo In case you're looking for inspiration for the TaQL demo, I have an old notebook at https://github.com/tammojan/taql-jupyter/blob/master/LearnTaQL.ipynb . One suggestion: if you're using python-casacore, better to import it as casacore rather than pyrap.

@bennahugo
Copy link
Author

Thanks @tammojan

@gervandiepen
Copy link
Contributor

gervandiepen commented Apr 12, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants