Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] [WIP] Port to_seurat() to SOMAExperiment #1261

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mojaveazure
Copy link
Member

Allow creating Seurat object directly from SOMA experiments; this includes support reading in multimodal SOMA objects into a multimodal Seurat object

Implemented SOMA methods:

  • SOMAExperiment$to_seurat(): allows loading in entire Seurat objects, including partially suppressing layers in every ms/X, dimensional reduction information, and nearest-neighbor graphs

Future work (tracked in #1192):

  • This PR does not address extended assays (eg. Seurat::SCTAssay, Signac::ChromatinAssay)
  • This PR does not address support for spatially-resolved datasets

@codecov-commenter
Copy link

codecov-commenter commented Apr 27, 2023

Codecov Report

Patch coverage has no change and project coverage change: +26.03 🎉

Comparison is base (db2782b) 65.30% compared to head (5eedbaf) 91.34%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1261       +/-   ##
===========================================
+ Coverage   65.30%   91.34%   +26.03%     
===========================================
  Files          98       30       -68     
  Lines        7959     2449     -5510     
===========================================
- Hits         5198     2237     -2961     
+ Misses       2761      212     -2549     
Flag Coverage Δ
python 91.34% <ø> (ø)
r ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 68 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to comment on a draft PR but I was asked to give some early feedback. @mojaveazure I'm sure you'll let us know when this PR is ready for review. Thanks for working on this! :)

apis/r/R/SOMAExperiment.R Outdated Show resolved Hide resolved
apis/r/R/SOMAExperiment.R Outdated Show resolved Hide resolved
#' }
#' By default, will try to determine \code{varm_layers} from
#' \code{obsm_layers} and load in all loadings for all dimensional
#' reductions for all measurements described in \code{X_layers}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK if this is R-tastic but I would personally LOVE periods at the ends of complete sentences

apis/r/R/SOMAExperiment.R Outdated Show resolved Hide resolved
var_column_names = NULL,
obsm_layers = NULL,
varm_layers = NULL,
obsp_layers = NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although they're rare I believe varp has been known to exist in the wild. Even if it doesn't arise naturally in Seurat data, IIUC it can arise in data from other provenances, so, as a multi-format-supporting package, if varp does get into a soma experiment somehow -- what should we do with it?

}
skip_reducs <- TRUE
}
if (!skip_reducs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we spell out skip_reductions?

@@ -269,75 +268,26 @@ SOMAExperimentAxisQuery <- R6::R6Class(
skip_reducs <- TRUE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
skip_reducs <- TRUE
skip_reductions <- TRUE

#' \itemize{
#' \item \dQuote{\code{embeddings}}: convert AnnData-style \code{obsm} names
#' \item \dQuote{\code{loadings}}: convert AnnData-style \code{varm} names
#' }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please drop in an explicit example

#'
#' @noRd
#'
.anndata_to_seurat_reduc <- function(x, type = c('embeddings', 'loadings')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.anndata_to_seurat_reduc <- function(x, type = c('embeddings', 'loadings')) {
.anndata_to_seurat_reduction <- function(x, type = c('embeddings', 'loadings')) {

object = vector(mode = 'list', length = length(obsp_layers)),
nm = obsp_layers
)
for (grph in obsp_layers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is graph a reserved word? any reason not to put the a in there?

Copy link
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. I left a few comments but one issue that needs to be addressed making the conversion via an empty query because it will always perform 2 reads from obs and var (once just to retrieve the joinids and then again to fetch the data). I love the effort to reuse code but in this case you might need to factor out the common components of SOMAExperiment$to_seurat and SOMAExperimentAxisQuery$to_seurat rather than trying to use the latter to implement the former.

@@ -0,0 +1,2 @@
#' @param obs_column_names Names of columns in \code{obs} to add as
#' cell-level meta data; by default, loads all columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' cell-level meta data; by default, loads all columns
#' cell-level metadata; by default, loads all columns

Comment on lines +17 to +24
#' @param X_layers A named list of named character vectors describing the
#' measurements to load and the layers within those measurements to read in;
#' for example: \preformatted{
#' list(
#' RNA = c(counts = "counts", data = "logcounts"),
#' ADT = c(counts = "counts")
#' )
#' }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.

Comment on lines +92 to +93
"'obs_index' must be a single character value" = is.null(obs_index) ||
is_scalar_character(obs_index),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is common enough that is_scalar_character_or_null() might be useful.

Comment on lines +108 to +114
if (!all(names(X_layers) %in% self$ms$names())) {
msg <- paste(
"The following measurements could not be found in this experiment:",
string_collapse(setdiff(x = names(X_layers), y = self$ms$names()))
)
stop(paste(strwrap(msg), collapse = '\n'), call. = FALSE)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we replace this block with assert_subset()?

Comment on lines 87 to 106
stopifnot(
"'X_layers' must be named list" = is_named_list(
X_layers,
allow_empty = FALSE
),
"'obs_index' must be a single character value" = is.null(obs_index) ||
is_scalar_character(obs_index),
"'var_index' must be a named character vector" = is_character_or_null(var_index),
"'var_column_names' must be a named list" = is.null(var_column_names) ||
is_named_list(var_column_names, allow_empty = FALSE),
"'obsm_layers' must be a named list" = is.null(obsm_layers) ||
is_scalar_logical(obsm_layers) ||
is_named_list(obsm_layers, allow_empty = FALSE),
"'varm_layers' must be a named list" = is.null(varm_layers) ||
is_scalar_logical(varm_layers) ||
is_named_list(varm_layers, allow_empty = FALSE),
"'obsp_layers' must be a named list" = is.null(obsp_layers) ||
is_scalar_logical(obsp_layers) ||
is_named_list(obsp_layers, allow_empty = FALSE)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are ostensibly the same assertions in SOMAExperimentAxisQuery$to_seurat() can we create a utility to centralize them to simplify the code and gurantee we're being consistent.

Comment on lines +210 to +219
object <- query$to_seurat(
X_layers = X_layers[[active]],
obs_index = obs_index,
var_index = var_index[[active]],
obs_column_names = obs_column_names,
var_column_names = var_column_names[[active]],
obsm_layers = obsm_layers[[active]],
varm_layers = varm_layers[[active]],
obsp_layers = obsp_layers[[active]]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be making this conversion via a query because it will always perform 2 reads (once to retrieve the joinids and then again to actually read the data) when only 1 is necessary here.

mojaveazure and others added 6 commits May 22, 2023 14:43
@mojaveazure mojaveazure force-pushed the ph/feat/to-seurat-experiment branch from 2987c81 to 5eedbaf Compare May 23, 2023 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants