-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[r] [WIP] Port to_seurat()
to SOMAExperiment
#1261
base: main
Are you sure you want to change the base?
Conversation
aaf1bbc
to
d78f6aa
Compare
d78f6aa
to
8e23f58
Compare
Codecov ReportPatch coverage has no change and project coverage change:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #1261 +/- ##
===========================================
+ Coverage 65.30% 91.34% +26.03%
===========================================
Files 98 30 -68
Lines 7959 2449 -5510
===========================================
- Hits 5198 2237 -2961
+ Misses 2761 212 -2549
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels weird to comment on a draft PR but I was asked to give some early feedback. @mojaveazure I'm sure you'll let us know when this PR is ready for review. Thanks for working on this! :)
#' } | ||
#' By default, will try to determine \code{varm_layers} from | ||
#' \code{obsm_layers} and load in all loadings for all dimensional | ||
#' reductions for all measurements described in \code{X_layers} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK if this is R-tastic but I would personally LOVE periods at the ends of complete sentences
var_column_names = NULL, | ||
obsm_layers = NULL, | ||
varm_layers = NULL, | ||
obsp_layers = NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although they're rare I believe varp
has been known to exist in the wild. Even if it doesn't arise naturally in Seurat data, IIUC it can arise in data from other provenances, so, as a multi-format-supporting package, if varp
does get into a soma experiment somehow -- what should we do with it?
} | ||
skip_reducs <- TRUE | ||
} | ||
if (!skip_reducs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we spell out skip_reductions
?
@@ -269,75 +268,26 @@ SOMAExperimentAxisQuery <- R6::R6Class( | |||
skip_reducs <- TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip_reducs <- TRUE | |
skip_reductions <- TRUE |
#' \itemize{ | ||
#' \item \dQuote{\code{embeddings}}: convert AnnData-style \code{obsm} names | ||
#' \item \dQuote{\code{loadings}}: convert AnnData-style \code{varm} names | ||
#' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please drop in an explicit example
#' | ||
#' @noRd | ||
#' | ||
.anndata_to_seurat_reduc <- function(x, type = c('embeddings', 'loadings')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.anndata_to_seurat_reduc <- function(x, type = c('embeddings', 'loadings')) { | |
.anndata_to_seurat_reduction <- function(x, type = c('embeddings', 'loadings')) { |
object = vector(mode = 'list', length = length(obsp_layers)), | ||
nm = obsp_layers | ||
) | ||
for (grph in obsp_layers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is graph
a reserved word? any reason not to put the a
in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great. I left a few comments but one issue that needs to be addressed making the conversion via an empty query because it will always perform 2 reads from obs and var (once just to retrieve the joinids and then again to fetch the data). I love the effort to reuse code but in this case you might need to factor out the common components of SOMAExperiment$to_seurat and SOMAExperimentAxisQuery$to_seurat rather than trying to use the latter to implement the former.
@@ -0,0 +1,2 @@ | |||
#' @param obs_column_names Names of columns in \code{obs} to add as | |||
#' cell-level meta data; by default, loads all columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' cell-level meta data; by default, loads all columns | |
#' cell-level metadata; by default, loads all columns |
#' @param X_layers A named list of named character vectors describing the | ||
#' measurements to load and the layers within those measurements to read in; | ||
#' for example: \preformatted{ | ||
#' list( | ||
#' RNA = c(counts = "counts", data = "logcounts"), | ||
#' ADT = c(counts = "counts") | ||
#' ) | ||
#' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great.
"'obs_index' must be a single character value" = is.null(obs_index) || | ||
is_scalar_character(obs_index), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is common enough that is_scalar_character_or_null()
might be useful.
if (!all(names(X_layers) %in% self$ms$names())) { | ||
msg <- paste( | ||
"The following measurements could not be found in this experiment:", | ||
string_collapse(setdiff(x = names(X_layers), y = self$ms$names())) | ||
) | ||
stop(paste(strwrap(msg), collapse = '\n'), call. = FALSE) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we replace this block with assert_subset()
?
apis/r/R/SOMAExperiment.R
Outdated
stopifnot( | ||
"'X_layers' must be named list" = is_named_list( | ||
X_layers, | ||
allow_empty = FALSE | ||
), | ||
"'obs_index' must be a single character value" = is.null(obs_index) || | ||
is_scalar_character(obs_index), | ||
"'var_index' must be a named character vector" = is_character_or_null(var_index), | ||
"'var_column_names' must be a named list" = is.null(var_column_names) || | ||
is_named_list(var_column_names, allow_empty = FALSE), | ||
"'obsm_layers' must be a named list" = is.null(obsm_layers) || | ||
is_scalar_logical(obsm_layers) || | ||
is_named_list(obsm_layers, allow_empty = FALSE), | ||
"'varm_layers' must be a named list" = is.null(varm_layers) || | ||
is_scalar_logical(varm_layers) || | ||
is_named_list(varm_layers, allow_empty = FALSE), | ||
"'obsp_layers' must be a named list" = is.null(obsp_layers) || | ||
is_scalar_logical(obsp_layers) || | ||
is_named_list(obsp_layers, allow_empty = FALSE) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are ostensibly the same assertions in SOMAExperimentAxisQuery$to_seurat()
can we create a utility to centralize them to simplify the code and gurantee we're being consistent.
object <- query$to_seurat( | ||
X_layers = X_layers[[active]], | ||
obs_index = obs_index, | ||
var_index = var_index[[active]], | ||
obs_column_names = obs_column_names, | ||
var_column_names = var_column_names[[active]], | ||
obsm_layers = obsm_layers[[active]], | ||
varm_layers = varm_layers[[active]], | ||
obsp_layers = obsp_layers[[active]] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be making this conversion via a query because it will always perform 2 reads (once to retrieve the joinids and then again to actually read the data) when only 1 is necessary here.
Allow creating `Seurat` object directly from SOMA experiments
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: John Kerl <[email protected]>
2987c81
to
5eedbaf
Compare
Allow creating
Seurat
object directly from SOMA experiments; this includes support reading in multimodal SOMA objects into a multimodalSeurat
objectImplemented SOMA methods:
SOMAExperiment$to_seurat()
: allows loading in entireSeurat
objects, including partially suppressing layers in everyms/X
, dimensional reduction information, and nearest-neighbor graphsFuture work (tracked in #1192):
Seurat::SCTAssay
,Signac::ChromatinAssay
)