Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of different visibility formats #595

Closed
6 tasks done
mpluess opened this issue Jul 19, 2024 · 3 comments · Fixed by #620
Closed
6 tasks done

Improve handling of different visibility formats #595

mpluess opened this issue Jul 19, 2024 · 3 comments · Fixed by #620
Assignees
Labels
enhancement New feature or request prio-high High priority

Comments

@mpluess
Copy link
Member

mpluess commented Jul 19, 2024

Currently, when generating visibilities with OSKAR, they are written to disk in both measurement set (MS) and the .vis binary format, essentially doubling the amount of disk space used. This needs to be fixed, especially for larger runs.
When generating visibilities with RASCIL, data is only saved to disk in .vis format.
Internally, Karabo currently works with two different types of visibilities: OSKAR returns karabo.simulation.visibility.Visibility objects whereas RASCIL returns ska_sdp_datamodels.visibility.Visibility objects. It would make sense to create a generalized visibility class and only return objects of this new class.

For the subsequent dirty imaging, we have three imagers.
OSKAR can only handle karabo.simulation.visibility.Visibility objects. Seems like it can work off MS as well as .vis files, preferring .vis, but this needs to be verified.
RASCIL can handle karabo.simulation.visibility.Visibility as well as ska_sdp_datamodels.visibility.Visibility objects and supports MS as well as .vis files.
WSClean can only handle karabo.simulation.visibility.Visibility objects and requires the MS format.

Image cleaning currently requires the MS format (to do dirty imaging + cleaning in one go) or an existing dirty image (only supported by WSClean, not RASCIL, see #572).

There is existing functionality to convert between .vis and MS:

Proposed solution:

  • OSKAR simulation: only write to disk in MS format. Make format configurable if possible, default MS.
  • RASCIL simulation: change from .vis to MS output. Make format configurable if possible, default MS.
  • Create generalized visibility class, return objects of the new class from all simulations
  • Change dirty imager interface to accept objects of the new visibility class. If possible, support passing path to MS, maybe .vis, maybe more.
  • Change image cleaner interface to accept an object of the new visibility class. If possible, support passing path to MS, maybe .vis, maybe more. See also Casa MS file path args should support Visibility class #616
  • Add capability to karabo.data.obscore.ObsCoreMeta.from_visibility to also work from MS, not just .vis
@mpluess mpluess added enhancement New feature or request prio-high High priority labels Jul 19, 2024
@Lukas113
Copy link
Collaborator

Lukas113 commented Jul 19, 2024

Agree to have as default the MS format since it has more support 👍

I think it's worth to not drop the .vis support wherever possible, if the overhead is not large. E.g. karabo.data.obscore.ObsCoreMeta.from_visibility could support both easily.

About the interface choice to only accept Karabo objects instead of e.g. MS fpath, I'm not 100% sure about. The most user-friendly way I can think of is to accept both. And if it's not an overhead or leads to a mess, I suggest to support both.

@sfiruch
Copy link
Member

sfiruch commented Aug 22, 2024

As discussed in 2024-08-22 meeting, the OSKAR default should be changed to MS.

Later, we should have automatic data type conversion when needed, ideally streaming - without touching the disk. Even later, these choices could be optimized by generating & optimizing the full data flow graph (i.e. lazy evaluation). But that's 2030 territory :)

@sfiruch
Copy link
Member

sfiruch commented Sep 19, 2024

We should keep design/APIs flexible enough to add support for other file formats in the future. MSv3 or DASK-MS are formats that are being discussed, so we'll probably need to support one/two of these in the future.

We're fine if .vis support is not as good, or not well-supported at the moment. But we should design our code with the understanding that we'll need support for more formats in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request prio-high High priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants