-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[r] Add Seurat -> SOMA Ingestor #1146
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #1146 +/- ##
===========================================
- Coverage 64.10% 53.54% -10.57%
===========================================
Files 93 66 -27
Lines 7174 5310 -1864
===========================================
- Hits 4599 2843 -1756
+ Misses 2575 2467 -108
Flags with carried forward coverage won't be shown. Click here to find out more. see 37 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
c890064
to
02273f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a huge step in the right direction.
I have some small niggles on style, give them a minute of though. Not vetoing anything over them though.
(And thanks for adding the logging in the last commit !)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. The padding step is to go from the "tighter" Seurat / sparse representation to something faithful to SOMA and its shape parameters?
Yes. In Seurat, not all matrices (namely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some testing for write_soma.Seurat()
I created 5 different SOMAExperiement
with different datasets from Tabula Muris Senis
And then performed equality checks between the following SOMA parts and their corresponding equivalents from the Seurat object:
obs
var
obsm["X_pca"]
obsm["X_tsne"]
obsm["X_umap"]
X["counts"]
(only the first 100 cells due to [r] Non-iteratedsoma_reader
should not return partial results #1111)X["data"]
(only the first 100 cells due to [r] Non-iteratedsoma_reader
should not return partial results #1111)
All data was created properly into the SOMAExperiement
and all equality checks passed!
R code used for this
library("tiledbsoma")
library("Seurat")
stopifnot_df_equal <- function(df1, df2) {
df1 <- as.data.frame(df1)
df2 <- as.data.frame(df2)
stopifnot(nrow(df1) == nrow(df2))
shared_cols <- intersect(colnames(df1), colnames(df2))
df1 <- df1[,shared_cols]
df2 <- df2[,shared_cols]
bool_matrix <- df1 != df2
bool_matrix[is.na(bool_matrix)] <- F
stopifnot(sum(colSums(bool_matrix)) == 0)
}
# Get Seurat objects
seurat_object <- readRDS("local.rds") # file from datasets here https://cellxgene.cziscience.com/collections/0b9d8a04-bb9d-44da-aa27-705bb65b54eb
seurat_obs <- seurat_object@meta.data
seurat_var <- seurat_object@assays$RNA@meta.features
seurat_pca <- seurat_object@reductions$pca@cell.embeddings
seurat_umap <- seurat_object@reductions$umap@cell.embeddings
seurat_tsne <- seurat_object@reductions$tsne@cell.embeddings
seurat_x_counts <- seurat_object@assays$RNA@counts
seurat_x_data <- seurat_object@assays$RNA@data
# Make soma object
unlink("from_seurat_soma", recursive = T)
write_soma(seurat_object, "from_seurat_soma")
# Get soma objects
soma_seurat <- SOMAExperimentOpen("from_seurat_soma/")
soma_obs <- as.data.frame(soma_seurat$get("obs")$read())
soma_var <- as.data.frame(soma_seurat$get("ms")$get("RNA")$get("var")$read())
soma_pca <- soma_seurat$get("ms")$get("RNA")$get("obsm")$get("X_pca")$read_dense_matrix()
soma_umap <- soma_seurat$get("ms")$get("RNA")$get("obsm")$get("X_umap")$read_dense_matrix()
soma_tsne <- soma_seurat$get("ms")$get("RNA")$get("obsm")$get("X_tsne")$read_dense_matrix()
soma_x_counts <- soma_seurat$get("ms")$get("RNA")$get("X")$get("counts")$read_sparse_matrix()
soma_x_data <- soma_seurat$get("ms")$get("RNA")$get("X")$get("data")$read_sparse_matrix()
# Perform equality comparisons
stopifnot_df_equal(seurat_obs, soma_obs)
stopifnot_df_equal(seurat_var, soma_var)
stopifnot(seurat_pca == soma_pca)
stopifnot(seurat_umap == soma_umap)
stopifnot(seurat_tsne == soma_tsne)
for (i in 1:100) {
stopifnot(seurat_x_counts[,i] == soma_x_counts[i,])
stopifnot(seurat_x_data[,i] == soma_x_data[i,])
}
Nice -- regarding #1111 please see #1194 and this idiom it uses (but which is already supported, you should be able to use it now): ## pass with budget of 45mb
ctx <- tiledbsoma::SOMATileDBContext$new(c(soma.init_buffer_bytes="45000000"))
sdf2 <- SOMADataFrameOpen(uri, tiledbsoma_ctx = ctx)
expect_silent(sdf2$read()) where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great and fills a gaping hole in our functionality. I made a few comments inline but there's a larger issue I think we should address: I'm just realizing we're currently missing R6 methods for adding new dataframes and ndarrays to a SOMACollection as described in the spec (e.g., add_new_dataframe()
). This is functionality that you've already implemented in this PR so let's use that code to add these missing R6 methods, which write_soma()
and friends can invoke.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great and fills a gaping hole in our functionality. I made a few comments inline but there's a larger issue I think we should address: I'm just realizing we're currently missing R6 methods for adding new dataframes and ndarrays to a SOMACollection as described in the spec (e.g., add_new_dataframe()
). This is functionality that you've already implemented in this PR so let's use that code to add these missing R6 methods, which write_soma()
and friends can invoke.
6ca7fa0
to
86902a8
Compare
Remove `overwrite` from `write_soma.Seurat()`
Co-authored-by: Aaron Wolen <[email protected]>
Co-authored-by: Aaron Wolen <[email protected]>
Co-authored-by: Aaron Wolen <[email protected]>
Co-authored-by: Aaron Wolen <[email protected]>
4255147
to
861efa7
Compare
8062031
to
861efa7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Last thing can you bump the version and update NEWS?
Bump develop version [ci-skip]
Add support for creating SOMA experiments from
Seurat
objects; all done using SeuratObject as a suggested dependencyNew function:
write_soma()
: generic function that takes an objectx
, a URI, and configuration and generates a complete SOMA experiment from itImplemented SOMA methods:
write_soma.Seurat()
: create a SOMA experiment from aSeurat
objectwrite_soma.Assay()
,write_soma.DimReduc()
,write_soma.Graph()
: helpers to write Seurat sub-objects to a SOMA experimentwrite_soma.data.frame()
: helper to write adata.frame
into a SOMA experimentwrite_soma.matrix()
,write_soma.Matrix()
,write_soma.TsparseMatrix()
: helpers to write various matrix formats into a SOMA experimentFuture work (tracking in #1192):
Seurat::SCTAssay
,Signac::ChromatinAssay
)resolves #942