Use of string arrays in the schema (especially unicode) #294

sstansill · 2024-10-22T10:13:08Z

For next generation telescopes (SKA, ngVLA), the zarr-python library likely won't provide a fast enough interface to MSv4 datasets. Instead, libraries written in lower level languages will be used and the MSv4 schema should be compatible with these libraries. In particular, the SKAO has used Google's TensorStore to prototype MSv4 support in WSClean.

The problem is that arrays with unicode datatypes aren't supported by any of the C/C++ zarr implementations listed here https://zarr.dev/implementations/. So, I propose that null-terminated byte sequences "<S*" should be used in place of unicode "<U*" data types for arrays (there are 59 instances of unicode dtypes in v4.0.0 of the schema).

Additionally, variable / unknown length strings ("<U0" and "<S0") should be avoided wherever possible to reduce the amount of data stored on disk and improve the speed of opening a dataset--all coordinates are read eagerly and variable length strings are slower to parse. For example, the polarization coordinate should have dtype "<S2". For the coordinates baseline_antenna1_name and baseline_antenna2_name, it may be best to revert to integer arrays. The names corresponding to an antenna index can be any length which leads to larger metadata and more verbose code--the long-format antenna names should be reserved for AntennaXds.

The text was updated successfully, but these errors were encountered:

sstansill added MSv4 Review Schema labels Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of string arrays in the schema (especially unicode) #294

Use of string arrays in the schema (especially unicode) #294

sstansill commented Oct 22, 2024 •

edited

Loading

Use of string arrays in the schema (especially unicode) #294

Use of string arrays in the schema (especially unicode) #294

Comments

sstansill commented Oct 22, 2024 • edited Loading

sstansill commented Oct 22, 2024 •

edited

Loading