Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor astroquery.heasarc to use VO protocols #2997

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

zoghbi-a
Copy link
Contributor

@zoghbi-a zoghbi-a commented Apr 25, 2024

This a major refactor of the heasarc module to use the VO interface to the archive. The main motivation is:

  • Allow for complex region and table queries.
  • Expose the TAP service.
  • Cleanup the tests.
  • Since the VO interface is the main archive interface, the archive will be able to support this module more.

The main changes inlcude:

  • The old class has been renamed HeasarcClass -> HeasarcBrowseClass. The initialized instance is also rename Heasarc -> HeasarcBrowse. The same for the test files.
  • The old HeasarcClass has been removed, and its main methods are included in the class with a deprecation message.
  • The new HeasarcClass class uses an interface similar to those used in other modules e.g. ipac.irsa.
  • A deprecation message has been added to the methods used for querying the tables and columns.
  • Added the ability to download data from the main heasarc servers, Sciserver and from the cloud.

@pep8speaks
Copy link

pep8speaks commented Apr 25, 2024

Hello @zoghbi-a! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 98:13: W503 line break before binary operator
Line 119:13: W503 line break before binary operator
Line 483:17: W503 line break before binary operator
Line 503:13: W503 line break before binary operator

Comment last updated at 2025-01-10 21:26:57 UTC

@zoghbi-a zoghbi-a force-pushed the heasarc-refactor branch 2 times, most recently from 43e9fe4 to 7f74228 Compare April 25, 2024 20:42
@zoghbi-a zoghbi-a marked this pull request as ready for review April 25, 2024 20:52
@bsipocz bsipocz added this to the v0.4.8 milestone May 10, 2024
@bsipocz
Copy link
Member

bsipocz commented May 10, 2024

The old class has been renamed HeasarcClass -> HeasarcBrowseClass. The initialized instance is also renamed Heasarc -> HeasarcBrowse. The same for the test files.

What is the motivation for this? Are there any datasets that are only accessible using the webform, but not the VO backends?

(If the only motivation is to keep what has been here, then it's not necessary, removing everything as part of the refactor is totally fine. Ideally, the old user codes should keep working, but with such a large backend restructure we also have precedence for breaking those)

Doing a proper review may take me until I'm back from the interop.

@zoghbi-a
Copy link
Contributor Author

All the tables should available through the new API, so it is kept in case some people are using it. If it is ok to remove the old class, I don't see a stopper.

@bsipocz
Copy link
Member

bsipocz commented May 10, 2024

so it is kept in case some people are using it

A rename doesn't really solve this scenario, as the continued support would have only worked if the name was kept the same, maybe with a deprecation warning.

So, before diving into a review, I would suggest cleaning up the old class. Maybe try to keep as much of the test examples/docs examples working as possible, or maybe working with a deprecation warning (e.g. in case some of the keywords need to be dropped, or renamed).

@zoghbi-a
Copy link
Contributor Author

The new class implements most of the useful methods of the old class with a deprecation warning. I will then delete the old class and keep the methods and warnings in the new class.

@zoghbi-a zoghbi-a force-pushed the heasarc-refactor branch 2 times, most recently from 354384f to baf95ed Compare May 15, 2024 14:58
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only leave a WIP review for now, that focuses only on the API, and will try to come back and look into the code itself later, hopefully later this week or next week. I was thinking it's useful to leave what I have for now, than hold it back until I have the time to do the full review.

astroquery/heasarc/__init__.py Outdated Show resolved Hide resolved
CHANGES.rst Outdated Show resolved Hide resolved
astroquery/heasarc/core.py Outdated Show resolved Hide resolved
astroquery/heasarc/core.py Show resolved Hide resolved
astroquery/heasarc/core.py Outdated Show resolved Hide resolved
astroquery/heasarc/core.py Outdated Show resolved Hide resolved
astroquery/heasarc/core.py Show resolved Hide resolved
astroquery/heasarc/core.py Show resolved Hide resolved
astroquery/heasarc/core.py Show resolved Hide resolved
astroquery/heasarc/tests/setup_package.py Show resolved Hide resolved
@zoghbi-a
Copy link
Contributor Author

@bsipocz, is there a timeline for completing the review?

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 65.88235% with 87 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@23d3471). Learn more about missing BASE report.
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
astroquery/heasarc/core.py 65.47% 87 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2997   +/-   ##
=======================================
  Coverage        ?   67.60%           
=======================================
  Files           ?      229           
  Lines           ?    18473           
  Branches        ?        0           
=======================================
  Hits            ?    12489           
  Misses          ?     5984           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an airport review, thus couldn't run the remote tests and their coverage.
Neither was building the documentation, but what I have noticed that it is full of both code and text typos, so please have a careful look at it.

The module itself looks reasonably good, my primary comments are the same I already left in the summer, namely it would be great if methods and arg names could be more similar to those already existing in some of the modules (e.g. avoid using table as a kwarg as it leads to either confusion or code mistakes.

astroquery/heasarc/core.py Outdated Show resolved Hide resolved
astroquery/heasarc/core.py Show resolved Hide resolved
astroquery/heasarc/core.py Outdated Show resolved Hide resolved
astroquery/heasarc/core.py Outdated Show resolved Hide resolved
Comment on lines +33 to +34
VO_URL = conf.VO_URL
TAR_URL = conf.TAR_URL
S3_BUCKET = conf.S3_BUCKET
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these here, or the properties could pick them up? (like we do in alma or the other modules that use pyvo based tap?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that these should not be configurable but rather as fixed variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to leave it like this as it helps with testing staging and test servers.

docs/heasarc/heasarc.rst Outdated Show resolved Hide resolved
docs/heasarc/heasarc.rst Outdated Show resolved Hide resolved
docs/heasarc/heasarc.rst Outdated Show resolved Hide resolved
docs/heasarc/heasarc.rst Outdated Show resolved Hide resolved
docs/heasarc/heasarc.rst Outdated Show resolved Hide resolved
@zoghbi-a zoghbi-a force-pushed the heasarc-refactor branch 2 times, most recently from 5ecfa95 to 717ba57 Compare January 6, 2025 22:40
@zoghbi-a
Copy link
Contributor Author

zoghbi-a commented Jan 9, 2025

1ce3523: rebasing to pick the latest changes in main and squashing all the changes.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of minor comments mostly focusing on consistency, and fixing a couple of typos in the docs, too. Please try to clean them up.

More importantly, I see 26 remote tests and doctest failures, those should be fixed before merging.

self._meta_info = self._meta_info[self._meta_info['value'] > 0]
return self._meta_info

def _get_default_columns(self, catalog_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make get_default_columns and get_default_radius consistent, either both private or public depending on whether the end user is expected to use it, but not one of each


if url is None:
url = conf.server
def set_session(self, session):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it private then? If it's not something the end user should do routinely then better to be tucked away a bit.


Return
------
a list of column names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

language nitpicking: The returned output is not a list but a column. Please update the docstring (really, just dropping 'list' is enough, as it can be misleading for someone to expect a python list. Do it here and in the description above, too)

(self._meta['table'] == catalog_name)
& (self._meta['par'] == '')
]
radius = np.double(meta['value'][0]) * u.arcmin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using np.double here and not e.g. np.float instead?

"""
return self.list_catalogs(master=False)

def list_columns(self, catalog_name, full=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have both get_default_columns and this list_columns methods, practically the default behaviour here could be what get_default_columns is atm, and another keyword could add the more detailed version with the description and unit columns.

The capabilities are currently very limited ... feature requests and contributions welcome!
There main interface for the Heasarc services``heasarc.Heasac`` now uses
Virtual Observatory protocols with the Xamin interface, which offers
more powerful search options than the old Browse interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still stands, no need to mention the old ways, but at least fix the language.

NGC_3783 60902005 174.7571 -37.7385

To query a region around some position, specifying the search radius,
we use `~~astropy.units.Quantity`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
we use `~~astropy.units.Quantity`:
we use `~astropy.units.Quantity`:


If no radius value is given, a default that is appropriate
for each catalog is used. You can see the value of the default
radius values by calling `~~astroquery.heasarc.HeasarcClass.get_default_radius`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
radius values by calling `~~astroquery.heasarc.HeasarcClass.get_default_radius`,
radius values by calling `~astroquery.heasarc.HeasarcClass.get_default_radius`,

passing the name of the catalog.

The list of returned columns can also be given as a comma-separated string to
`~~astroquery.heasarc.HeasarcClass.query_region`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`~~astroquery.heasarc.HeasarcClass.query_region`:
`~astroquery.heasarc.HeasarcClass.query_region`:

13008 1RXS J075526.1+391111 55536.6453587963 Liu 1.4842785992883953

If no columns are given, the call will return a set of default columns.
If you want all the columns returned, use ``columns='*'```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you want all the columns returned, use ``columns='*'```
If you want all the columns returned, use ``columns='*'``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants