first attempt to support awkward arrays #647

giovp · 2021-11-14T11:14:21Z

First draft at supporting awkward arrays.
As discussed with @ivirshup this would be useful for Squidpy @hspitzer (discussed also here #609 ) and potentially @Zethson EHR project.

Here's a walkthrough showcasing some basic functionality we could use for sub observation annotations (e.g. spatial coordinates of rna-molecules/segmentations).

Details

import numpy as np
import scanpy as sc
import squidpy as sq
import matplotlib.pyplot as plt
from numpy.random import default_rng
from sklearn.datasets import make_blobs
import awkward as ak

adata = sq.datasets.visium_hne_adata()
adata = adata[:10, :].copy()
adata.obsm["spatial"] = (
    adata.obsm["spatial"] - np.std(adata.obsm["spatial"], 0)
) / np.mean(adata.obsm["spatial"], 0)

# generate data
obs_list = []
rng = default_rng(42)
for idx in adata.obs_names.values:
    coord, _ = make_blobs(
        n_samples=rng.integers(5, 15),
        cluster_std=0.02,
        centers=adata[idx].obsm["spatial"],
        random_state=42,
    )
    obs_list.append(coord)

sub_obs = ak.Array(obs_list)

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].set_aspect("equal")
ax[0].scatter(x=adata.obsm["spatial"][:, 0], y=adata.obsm["spatial"][:, 1])
for i in range(adata.shape[0]):
    ax[1].scatter(
        x=sub_obs[i, :, 0],
        y=sub_obs[i, :, 1],
    )
    ax[1].axis("equal")

adata.obsm["sub_obs"] = sub_obs  # sub_obs is an awkward array
adata_subset = adata[:5]  # let's subset, it also works for `.copy()`

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].set_aspect("equal")
ax[0].scatter(
    x=adata_subset.obsm["spatial"][:, 0], y=adata_subset.obsm["spatial"][:, 1]
)
for i in range(adata_subset.shape[0]):
    ax[1].scatter(
        x=adata_subset.obsm["sub_obs"][i, :, 0],
        y=adata_subset.obsm["sub_obs"][i, :, 1],
    )
    ax[1].set_aspect("equal")

What fails atm is adata.concatenate() because of errors in reindexing of alternate axis
https://github.com/theislab/anndata/blob/286bc7f207863964cb861f17c96ab24fe0cf72ac/anndata/_core/merge.py#L478

in awkward arrays there is no shape attribute so when array.shape[0] is needed we can resort to len(awkward_array) but for array.shape[1] we should (probably) simply skip it and concatenate. In awkward it would be like this:

ak.concatenate([sub_obs, sub_obs])
>>> <Array [[[1.25, 0.833], ... [0.697, 1.22]]] type='20 * var * var * float64'>

TODO

wait for Specs for all elements #554 to get merged and work on IO
switch to v2 api of awkward. v1 EOL is in four months and bugs are not getting fixed already now
~~- [ ] address newly added TODOs in the code~~
Implement outer joins during concatenation. Either implement inner/outer join logic or raise a corresponding warning
- Raise a warning like: "Outer joins on awkward.Arrays will have different return values in the future. Please see github.com/scverse/anndata/issues#XXX for details, and offer input."
AwkwardArrayView with behavior

Tests

Docs

~~[ ] add examples~~
- Add tutorial for awkward arrays anndata-tutorials#15
show which types of awkward arrays are supported (need to be "regular" in the aligned dimensions)

codecov · 2021-11-14T11:16:47Z

Codecov Report

Merging #647 (e524389) into master (283b0c1) will increase coverage by 0.08%.
The diff coverage is 88.54%.

@@            Coverage Diff             @@
##           master     #647      +/-   ##
==========================================
+ Coverage   83.12%   83.21%   +0.08%     
==========================================
  Files          34       34              
  Lines        4416     4503      +87     
==========================================
+ Hits         3671     3747      +76     
- Misses        745      756      +11

Impacted Files	Coverage Δ
anndata/compat/__init__.py	`80.97% <37.50%> (-2.08%)`	⬇️
anndata/_core/views.py	`87.90% <75.00%> (-0.99%)`	⬇️
anndata/utils.py	`83.33% <80.00%> (-0.60%)`	⬇️
anndata/_core/aligned_mapping.py	`94.02% <100.00%> (+0.09%)`	⬆️
anndata/_core/anndata.py	`83.41% <100.00%> (ø)`
anndata/_core/index.py	`92.85% <100.00%> (+0.35%)`	⬆️
anndata/_core/merge.py	`93.76% <100.00%> (+0.31%)`	⬆️
anndata/_io/specs/methods.py	`84.53% <100.00%> (+0.75%)`	⬆️
anndata/tests/helpers.py	`95.63% <100.00%> (+0.19%)`	⬆️

Zethson · 2021-11-14T12:14:37Z

Really cool @giovp and everyone involved. I can already see a couple of use-cases for ehrapy and am eager to see this move forward. Do you expect the support for akward array to also require modifications for all/many of Scanpys algorithms is this mostly a drop-in replacement?

CC @Imipenem

giovp · 2021-11-14T13:42:12Z

Do you expect the support for akward array to also require modifications for all/many of Scanpys algorithms is this mostly a drop-in replacement?

I doubt it, also can't think of any scanpy function that could use such representation out of the box. We'd have multiple use cases in squidpy though! Essentially being able to slice/subset/concatenate and copy anndata preserving the 0-axis of an akward array would cover most fo the cases I can think of right now.

ivirshup · 2021-11-15T11:22:05Z

@giovp, for concat, AFAICT you can't have a multi-dimensional awkward array, so there won't be an "alternative" axis. Also, I don't think there is a logical way to concatenate an awkward array with any other kind of array, so it should probably just error. I think this could be handled similar to the all dataframe case.

Could you modify gen_adata in anndata.tests.helpers to add awkward arrays into obsm and varm? I think that'll help catch a lot of bugs.

giovp · 2021-11-15T12:32:45Z

@giovp, for concat, AFAICT you can't have a multi-dimensional awkward array, so there won't be an "alternative" axis. Also, I don't think there is a logical way to concatenate an awkward array with any other kind of array, so it should probably just error. I think this could be handled similar to the all dataframe case.

yeah you can't have multi-dimensonal akward array, but I think it would still be good to concatenate them across axis=0, and so this should be supported and hence escape the current alterante_axes check?

Could you modify gen_adata in anndata.tests.helpers to add awkward arrays into obsm and varm? I think that'll help catch a lot of bugs.

I will do that and run test locally, sorry for that

anndata/_core/anndata.py

ivirshup · 2021-11-15T13:21:21Z

yeah you can't have multi-dimensonal akward array, but I think it would still be good to concatenate them across axis=0, and so this should be supported and hence escape the current alterante_axes check?

Ahh yeah, this is what I meant. Basically have a case for all elements being awkward arrays.

giovp · 2021-11-15T20:51:33Z

Ok, at this stage concat works both for inner and outer on obsm, and subsetting works for varm.

Example

import numpy as np
import scanpy as sc
import squidpy as sq
import matplotlib.pyplot as plt
from numpy.random import default_rng
from sklearn.datasets import make_blobs
import awkward as ak
import pandas as pd
from cycler import cycler

sc.set_figure_params()

adata = sq.datasets.visium_hne_adata()
varm = adata.obsm["spatial"][15:30, :]
adata = adata[:10, :15].copy()
adata.obsm["spatial"] = (
    adata.obsm["spatial"] - np.std(adata.obsm["spatial"], 0)
) / np.mean(adata.obsm["spatial"], 0)

adata.varm["spatial"] = varm.copy()
adata.varm["spatial"] = (
    adata.varm["spatial"] - np.std(adata.varm["spatial"], 0)
) / np.mean(adata.varm["spatial"], 0)

obs_list = []
var_list = []
rng = default_rng(42)
for idx in adata.obs_names.values:
    coord, _ = make_blobs(
        n_samples=rng.integers(5, 15),
        cluster_std=0.02,
        centers=adata[idx].obsm["spatial"],
        random_state=42,
    )
    obs_list.append(coord)

for idx in adata.var_names.values:
    coord, _ = make_blobs(
        n_samples=rng.integers(5, 15),
        cluster_std=0.02,
        centers=adata[:, idx].varm["spatial"],
        random_state=42,
    )
    var_list.append(coord)

sub_obs = ak.Array(obs_list)
sub_var = ak.Array(var_list)

def plot_points(adata, main: np.ndarray, sub: ak.Array, axis: int, cmap_name):

    fig, ax = plt.subplots(1, 2, figsize=(10, 5))
    ax[1].set_prop_cycle(
        cycler("color", plt.get_cmap(cmap_name)(np.linspace(0, 1, len(main))))
    )
    ax[0].axis("equal")
    ax[0].scatter(
        x=main[:, 0], y=main[:, 1], c="grey", edgecolors="black", s=100, linewidths=1
    )
    for i in range(adata.shape[axis]):
        ax[1].scatter(
            x=sub[i, :, 0],
            y=sub[i, :, 1],
            edgecolors="black",
            alpha=0.7,
        )
        ax[1].axis("equal")

    return

plot_points(adata, adata.obsm["spatial"], sub_obs, 0, "winter")
plot_points(adata, adata.varm["spatial"], sub_var, 1, "cool")

adata.obsm

adata.varm

Slicing and copying also works

adata.obsm["sub_obs"] = sub_obs  # sub_obs is an awkward array
adata.varm["sub_var"] = sub_var
adata_subset = adata[:5, :7]  # let's subset

If I run test locally with pytest anndata/tests no tests fails, not sure, where shall I start looking?

ivirshup · 2021-11-15T21:02:20Z

I think you just need to add awkward array to the testing dependencies

ivirshup · 2021-11-15T21:05:17Z

Also, for gen_adata to return an object with awkward arrays in it, you've got to generate them and place them here:

https://github.com/theislab/anndata/blob/286bc7f207863964cb861f17c96ab24fe0cf72ac/anndata/tests/helpers.py#L111-L121

giovp · 2021-11-16T12:46:47Z

Also, for gen_adata to return an object with awkward arrays in it, you've got to generate them and place them here:

🤦 🤦 🤦

So I'm at stage where I could fix+add tests for awkward array for ages and am happy to do so but would like to get a sense of how welcomed this PR is. As it stands, we'd have to include awkward array as a (optional) dependency and it would be a non-negligible addition to the code base (although I expected worse, with singledispatch made it quite slim).

A major thing that could be impactful is that there is no .view() or .copy() notion in awkward array, see here and here.

Overall, I think we could use it right away in Squidpy, mostly as a way to enable anndata slicing and indexing while retaining sub-obs or sub-var info. We would not really use it to do any arithmetics or stuff like that, although it would be cool to come up with function ideas and also the fact that it support numba and jax jitting is cool.

So, to summarize, shall I go ahead? @ivirshup @hspitzer @Zethson @michalk8 ?

ivirshup · 2021-11-17T10:38:52Z

I'm all for it. Having more record like data has been requested a number of times.

I would note that there are some things in the awkward array api that will change (scikit-hep/awkward#1151), but that's mostly down the line stuff.

A major thing that could be impactful is that there is no .view() or .copy() notion in awkward array

I believe it does have copy, just not as a method. We might be able to get them to add that?

anndata/compat/__init__.py

anndata/tests/test_awkward.py

anndata/_core/aligned_mapping.py

This fixed a number of tests because we had a 1d awkward array being generated, and we currently don't support 1d arrays in obsm well. Tracked in #652.

…the arrays weren't broadcastable.

…d array

ivirshup · 2023-02-02T17:42:44Z

@grst tests are passing!

I think we need to open some issues on behavior we want to change, like the unions from outer concatenation. I would also like to take a look over the coverage, and see what we're missing.

How's the tutorial going?

grst · 2023-02-02T20:02:53Z

How's the tutorial going?

I can finish that beginning of next week

anndata/_core/views.py

anndata/_core/merge.py

for more information, see https://pre-commit.ci

jpivarski · 2023-02-07T20:20:37Z

Congratulations!!! This is not an ordinary PR. :)

giovp added 3 commits November 14, 2021 11:54

first attempt to support awkward arrays

5604eac

remove comments

7dbe908

better comment

c0bbf5a

ivirshup reviewed Nov 15, 2021

View reviewed changes

anndata/_core/anndata.py Outdated Show resolved Hide resolved

giovp added 4 commits November 15, 2021 15:31

add type to gen_adata

0281324

first attempt at concat

624a529

remove comment

05c6c75

add outer concat

3d359de

giovp added 4 commits November 15, 2021 22:06

add awkward to test dep

9bf0cb9

add awk arr to data gen

974040c

fix test base

13c4d59

init test for concat

74ae9e3

giovp added 5 commits November 19, 2021 18:22

fix concatenate tests

1d0e629

create mock class for awkward array

aeba549

remove space

88a5c83

import ak when needed

15b3d1a

relative import of awk array

7e6beaa

giovp commented Nov 29, 2021

View reviewed changes

anndata/compat/__init__.py Show resolved Hide resolved

fix optional dep import

77d5b6c

ivirshup reviewed Feb 1, 2023

View reviewed changes

anndata/tests/test_awkward.py Show resolved Hide resolved

Use warning instead of logging

cf4ad03

ivirshup reviewed Feb 1, 2023

View reviewed changes

anndata/_core/aligned_mapping.py Show resolved Hide resolved

grst and others added 8 commits February 2, 2023 13:07

extend todo comment about views

46d553f

Fix IO, and to_memory for views of awkward arrays

e8eeb54

Removed a number of test cases that we're not targeting

5ab0708

This fixed a number of tests because we had a 1d awkward array being generated, and we currently don't support 1d arrays in obsm well. Tracked in #652.

Implement outer indexing on axis 0 of an awkward array

5b39691

Fix gen_awkward when one of the dimensions has size 0

45a9958

Fix equality function for awkward arrays. Was throwing an error when …

94aa4ef

…the arrays weren't broadcastable.

Modify outer concatenation test to accept current behaviour of awkwar…

99853d5

…d array

Merge branch 'master' into val_shape

cd2abdd

ivirshup self-requested a review February 2, 2023 17:38

ivirshup mentioned this pull request Feb 2, 2023

Prevent Unions during outer concatenation with awkward arrays #898

Open

ivirshup added 3 commits February 2, 2023 22:19

Add tests for mixed type concatenation with awkward arrays

96bfe31

Add warning about outer joins

4a6d119

Call ak._util.arrays_approx_equal instead of rolling our own

4243ccc

ivirshup reviewed Feb 3, 2023

View reviewed changes

anndata/_core/views.py Outdated Show resolved Hide resolved

update awkward to 2.0.7 (unfortunately: errors)

5ad915a

ivirshup reviewed Feb 6, 2023

View reviewed changes

anndata/_core/merge.py Show resolved Hide resolved

grst and others added 4 commits February 6, 2023 18:26

remove unnecessary checks from AwkwardArrayView

07246cc

Workaround scikit-hep/awkward#2209

fb137af

[pre-commit.ci] auto fixes from pre-commit.com hooks

6e32637

for more information, see https://pre-commit.ci

Removed extra layer of nesting from on-disk format for awkward arrays

3883bb0

ivirshup approved these changes Feb 7, 2023

View reviewed changes

ivirshup merged commit a9e634c into master Feb 7, 2023

ivirshup deleted the val_shape branch February 7, 2023 19:47

ivirshup mentioned this pull request Feb 7, 2023

Storing sub-obs (variable length per observation data) #609

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first attempt to support awkward arrays #647

first attempt to support awkward arrays #647

giovp commented Nov 14, 2021 •

edited by ivirshup

Loading

codecov bot commented Nov 14, 2021 •

edited

Loading

Zethson commented Nov 14, 2021

giovp commented Nov 14, 2021

ivirshup commented Nov 15, 2021

giovp commented Nov 15, 2021 •

edited

Loading

ivirshup commented Nov 15, 2021

giovp commented Nov 15, 2021

adata.obsm

adata.varm

ivirshup commented Nov 15, 2021

ivirshup commented Nov 15, 2021

giovp commented Nov 16, 2021

ivirshup commented Nov 17, 2021

ivirshup commented Feb 2, 2023

grst commented Feb 2, 2023

jpivarski commented Feb 7, 2023

first attempt to support awkward arrays #647

first attempt to support awkward arrays #647

Conversation

giovp commented Nov 14, 2021 • edited by ivirshup Loading

TODO

Tests

Docs

codecov bot commented Nov 14, 2021 • edited Loading

Codecov Report

Zethson commented Nov 14, 2021

giovp commented Nov 14, 2021

ivirshup commented Nov 15, 2021

giovp commented Nov 15, 2021 • edited Loading

ivirshup commented Nov 15, 2021

giovp commented Nov 15, 2021

adata.obsm

adata.varm

ivirshup commented Nov 15, 2021

ivirshup commented Nov 15, 2021

giovp commented Nov 16, 2021

ivirshup commented Nov 17, 2021

ivirshup commented Feb 2, 2023

grst commented Feb 2, 2023

jpivarski commented Feb 7, 2023

giovp commented Nov 14, 2021 •

edited by ivirshup

Loading

codecov bot commented Nov 14, 2021 •

edited

Loading

giovp commented Nov 15, 2021 •

edited

Loading