[python] Account for missing end-rows in shape-upgrader #3538
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
As tracked on #2407 / [sc-51048] and documented at
https://cloud.tiledb.com/academy/structure/life-sciences/single-cell/tutorials/shapes/
we have a new shape feature as of TileDB-SOMA 1.15. This is a core-database-managed construct which keeps track of the 'bounding box' of a sparse array, regardless of how many cells have (or have not) been written to the array.
All TileDB-SOMA experiments created at 1.15 and above will have this new shape on all component dataframes/arrays.
For TileDB-SOMA experiments created by TileDB-SOMA software versions < 1.15, we offer
tiledbsoma.io.upgrade_shape
as documented also herehttps://cloud.tiledb.com/academy/structure/life-sciences/single-cell/tutorials/shapes/
This leverages the old
used_shape
(which is now deprecated and replace by the new-and-improved shape).[sc-61530]
Bug
There is a corner-case bug discovered recently. Namely, suppose an experiment created by TileDB-SOMA software version < 1.15 has
nobs
of 1,000,000, and hasnvar
of 60,000 for measurement"RNA"
. Then intiledbsoma.io.upgrade_shape
we want to endow theX
arrays in the"RNA"
measurement with that shape of(1_000_000, 60_000)
. Before this PR, we were consulting the oldused_shape
for thatX
arrays. There are corner cases where anX
array has no occupied cell-counts whatsoever for the last one or more rows. In such a case, theused_shape
will be, say,(999_998, 60_000)
. Then, subsequently, if the user does anExperimentAxisQuery
which includes the last rows ofobs
, namelysoma_joinid
of 999,998 or 999,999, then theExperimentAxisQuery
will error withSolution
On this PR, when endowing old experiments with new shapes, we correctly consult the
nobs
andnvar
forX
(likewise withobsm
,obsp
,varm
, andvarp
, mutatis mutandis). In the above example, the shape would be properly set to(1_000_000, 60_000)
.Repair
How to check:
fix-check.py
How to fix:
cat fix.py