[r] [WIP] Port `to_seurat()` to `SOMAExperiment` #1261

mojaveazure · 2023-04-18T19:30:27Z

Allow creating Seurat object directly from SOMA experiments; this includes support reading in multimodal SOMA objects into a multimodal Seurat object

Implemented SOMA methods:

SOMAExperiment$to_seurat(): allows loading in entire Seurat objects, including partially suppressing layers in every ms/X, dimensional reduction information, and nearest-neighbor graphs

Future work (tracked in #1192):

This PR does not address extended assays (eg. Seurat::SCTAssay, Signac::ChromatinAssay)
This PR does not address support for spatially-resolved datasets

codecov-commenter · 2023-04-27T19:27:53Z

Codecov Report

Patch coverage has no change and project coverage change: +26.03 🎉

Comparison is base (db2782b) 65.30% compared to head (5eedbaf) 91.34%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1261       +/-   ##
===========================================
+ Coverage   65.30%   91.34%   +26.03%     
===========================================
  Files          98       30       -68     
  Lines        7959     2449     -5510     
===========================================
- Hits         5198     2237     -2961     
+ Misses       2761      212     -2549

Flag	Coverage Δ
python	`91.34% <ø> (ø)`
r	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 68 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

johnkerl

It feels weird to comment on a draft PR but I was asked to give some early feedback. @mojaveazure I'm sure you'll let us know when this PR is ready for review. Thanks for working on this! :)

apis/r/R/SOMAExperiment.R

johnkerl · 2023-05-03T16:44:30Z

apis/r/R/SOMAExperiment.R

+    #' }
+    #' By default, will try to determine \code{varm_layers} from
+    #' \code{obsm_layers} and load in all loadings for all dimensional
+    #' reductions for all measurements described in \code{X_layers}


IDK if this is R-tastic but I would personally LOVE periods at the ends of complete sentences

apis/r/R/SOMAExperiment.R

johnkerl · 2023-05-03T16:46:18Z

apis/r/R/SOMAExperiment.R

+      var_column_names = NULL,
+      obsm_layers = NULL,
+      varm_layers = NULL,
+      obsp_layers = NULL


Although they're rare I believe varp has been known to exist in the wild. Even if it doesn't arise naturally in Seurat data, IIUC it can arise in data from other provenances, so, as a multi-format-supporting package, if varp does get into a soma experiment somehow -- what should we do with it?

johnkerl · 2023-05-03T16:56:24Z

apis/r/R/SOMAExperiment.R

+          }
+          skip_reducs <- TRUE
+        }
+        if (!skip_reducs) {


can we spell out skip_reductions?

johnkerl · 2023-05-03T16:56:56Z

apis/r/R/SOMAExperimentAxisQuery.R

@@ -269,75 +268,26 @@ SOMAExperimentAxisQuery <- R6::R6Class(
        skip_reducs <- TRUE


Suggested change

skip_reducs <- TRUE

skip_reductions <- TRUE

johnkerl · 2023-05-03T16:57:48Z

apis/r/R/utils-seurat.R

+#' \itemize{
+#'  \item \dQuote{\code{embeddings}}: convert AnnData-style \code{obsm} names
+#'  \item \dQuote{\code{loadings}}: convert AnnData-style \code{varm} names
+#' }


please drop in an explicit example

johnkerl · 2023-05-03T16:57:59Z

apis/r/R/utils-seurat.R

+#'
+#' @noRd
+#'
+.anndata_to_seurat_reduc <- function(x, type = c('embeddings', 'loadings')) {


Suggested change

.anndata_to_seurat_reduc <- function(x, type = c('embeddings', 'loadings')) {

.anndata_to_seurat_reduction <- function(x, type = c('embeddings', 'loadings')) {

johnkerl · 2023-05-03T16:58:50Z

apis/r/R/utils-seurat.R

+    object = vector(mode = 'list', length = length(obsp_layers)),
+    nm = obsp_layers
+  )
+  for (grph in obsp_layers) {


is graph a reserved word? any reason not to put the a in there?

aaronwolen

This is looking great. I left a few comments but one issue that needs to be addressed making the conversion via an empty query because it will always perform 2 reads from obs and var (once just to retrieve the joinids and then again to fetch the data). I love the effort to reuse code but in this case you might need to factor out the common components of SOMAExperiment$to_seurat and SOMAExperimentAxisQuery$to_seurat rather than trying to use the latter to implement the former.

aaronwolen · 2023-05-03T14:44:08Z

apis/r/man/roxygen/templates/param-obs-column-names.R

@@ -0,0 +1,2 @@
+#' @param obs_column_names Names of columns in \code{obs} to add as
+#' cell-level meta data; by default, loads all columns


Suggested change

#' cell-level meta data; by default, loads all columns

#' cell-level metadata; by default, loads all columns

aaronwolen · 2023-05-03T14:46:12Z

apis/r/R/SOMAExperiment.R

+    #' @param X_layers A named list of named character vectors describing the
+    #' measurements to load and the layers within those measurements to read in;
+    #' for example: \preformatted{
+    #' list(
+    #'   RNA = c(counts = "counts", data = "logcounts"),
+    #'   ADT = c(counts = "counts")
+    #' )
+    #' }


This is great.

aaronwolen · 2023-05-03T15:37:03Z

apis/r/R/SOMAExperiment.R

+        "'obs_index' must be a single character value" = is.null(obs_index) ||
+          is_scalar_character(obs_index),


This pattern is common enough that is_scalar_character_or_null() might be useful.

aaronwolen · 2023-05-03T16:53:56Z

apis/r/R/SOMAExperiment.R

+      if (!all(names(X_layers) %in% self$ms$names())) {
+        msg <- paste(
+          "The following measurements could not be found in this experiment:",
+          string_collapse(setdiff(x = names(X_layers), y = self$ms$names()))
+        )
+        stop(paste(strwrap(msg), collapse = '\n'), call. = FALSE)
+      }


Could we replace this block with assert_subset()?

aaronwolen · 2023-05-03T16:57:00Z

apis/r/R/SOMAExperiment.R

+      stopifnot(
+        "'X_layers' must be named list" = is_named_list(
+          X_layers,
+          allow_empty = FALSE
+        ),
+        "'obs_index' must be a single character value" = is.null(obs_index) ||
+          is_scalar_character(obs_index),
+        "'var_index' must be a named character vector" = is_character_or_null(var_index),
+        "'var_column_names' must be a named list" = is.null(var_column_names) ||
+          is_named_list(var_column_names, allow_empty = FALSE),
+        "'obsm_layers' must be a named list" = is.null(obsm_layers) ||
+          is_scalar_logical(obsm_layers) ||
+          is_named_list(obsm_layers, allow_empty = FALSE),
+        "'varm_layers' must be a named list" = is.null(varm_layers) ||
+          is_scalar_logical(varm_layers) ||
+          is_named_list(varm_layers, allow_empty = FALSE),
+        "'obsp_layers' must be a named list" = is.null(obsp_layers) ||
+          is_scalar_logical(obsp_layers) ||
+          is_named_list(obsp_layers, allow_empty = FALSE)
+      )


Since these are ostensibly the same assertions in SOMAExperimentAxisQuery$to_seurat() can we create a utility to centralize them to simplify the code and gurantee we're being consistent.

aaronwolen · 2023-05-03T17:58:34Z

apis/r/R/SOMAExperiment.R

+      object <- query$to_seurat(
+        X_layers = X_layers[[active]],
+        obs_index = obs_index,
+        var_index = var_index[[active]],
+        obs_column_names = obs_column_names,
+        var_column_names = var_column_names[[active]],
+        obsm_layers = obsm_layers[[active]],
+        varm_layers =  varm_layers[[active]],
+        obsp_layers = obsp_layers[[active]]
+      )


I don't think we should be making this conversion via a query because it will always perform 2 reads (once to retrieve the joinids and then again to actually read the data) when only 1 is necessary here.

Allow creating `Seurat` object directly from SOMA experiments

Co-authored-by: John Kerl <[email protected]>

mojaveazure added the r-api label Apr 18, 2023

mojaveazure mentioned this pull request Apr 19, 2023

[r][python] Create an AnnData -> SOMA -> Seurat test #1263

Merged

mojaveazure force-pushed the ph/feat/to-seurat-experiment branch from aaf1bbc to d78f6aa Compare April 20, 2023 19:25

mojaveazure mentioned this pull request Apr 26, 2023

[r] Seurat <> SOMA interop future work #1192

Open

7 tasks

mojaveazure force-pushed the ph/feat/to-seurat-experiment branch from d78f6aa to 8e23f58 Compare April 27, 2023 19:08

johnkerl reviewed May 3, 2023

View reviewed changes

aaronwolen requested changes May 3, 2023

View reviewed changes

mojaveazure and others added 6 commits May 22, 2023 14:43

Port to_seurat() to SOMAExperiment

45e0ffc

Allow creating `Seurat` object directly from SOMA experiments

Use template for docs

75be023

Update apis/r/R/SOMAExperiment.R

ee07503

Co-authored-by: John Kerl <[email protected]>

Update apis/r/R/SOMAExperiment.R

c701deb

Co-authored-by: John Kerl <[email protected]>

Update apis/r/R/SOMAExperiment.R

c27bab4

Co-authored-by: John Kerl <[email protected]>

Update apis/r/R/SOMAExperiment.R

5eedbaf

Co-authored-by: John Kerl <[email protected]>

mojaveazure force-pushed the ph/feat/to-seurat-experiment branch from 2987c81 to 5eedbaf Compare May 23, 2023 15:11

Update tests with new statefulness

4513521

mojaveazure mentioned this pull request May 25, 2023

[r] Restore SOMASparseNDArray$read_sparse_matrix() #1414

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[r] [WIP] Port `to_seurat()` to `SOMAExperiment` #1261

[r] [WIP] Port `to_seurat()` to `SOMAExperiment` #1261

mojaveazure commented Apr 18, 2023

codecov-commenter commented Apr 27, 2023 •

edited

Loading

johnkerl left a comment

johnkerl May 3, 2023

johnkerl May 3, 2023

johnkerl May 3, 2023

johnkerl May 3, 2023

johnkerl May 3, 2023

johnkerl May 3, 2023

johnkerl May 3, 2023

aaronwolen left a comment

aaronwolen May 3, 2023

aaronwolen May 3, 2023

aaronwolen May 3, 2023

aaronwolen May 3, 2023

aaronwolen May 3, 2023

aaronwolen May 3, 2023

		@@ -269,75 +268,26 @@ SOMAExperimentAxisQuery <- R6::R6Class(
		skip_reducs <- TRUE

	.anndata_to_seurat_reduc <- function(x, type = c('embeddings', 'loadings')) {
	.anndata_to_seurat_reduction <- function(x, type = c('embeddings', 'loadings')) {

		@@ -0,0 +1,2 @@
		#' @param obs_column_names Names of columns in \code{obs} to add as
		#' cell-level meta data; by default, loads all columns

	#' cell-level meta data; by default, loads all columns
	#' cell-level metadata; by default, loads all columns

		"'obs_index' must be a single character value" = is.null(obs_index) \|\|
		is_scalar_character(obs_index),

[r] [WIP] Port to_seurat() to SOMAExperiment #1261

Are you sure you want to change the base?

[r] [WIP] Port to_seurat() to SOMAExperiment #1261

Conversation

mojaveazure commented Apr 18, 2023

codecov-commenter commented Apr 27, 2023 • edited Loading

Codecov Report

johnkerl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronwolen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[r] [WIP] Port `to_seurat()` to `SOMAExperiment` #1261

[r] [WIP] Port `to_seurat()` to `SOMAExperiment` #1261

codecov-commenter commented Apr 27, 2023 •

edited

Loading