Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] Add Seurat -> SOMA Ingestor #1146

Merged
merged 52 commits into from
Apr 20, 2023
Merged

[r] Add Seurat -> SOMA Ingestor #1146

merged 52 commits into from
Apr 20, 2023

Conversation

mojaveazure
Copy link
Member

@mojaveazure mojaveazure commented Mar 20, 2023

Add support for creating SOMA experiments from Seurat objects; all done using SeuratObject as a suggested dependency

New function:

  • write_soma(): generic function that takes an object x, a URI, and configuration and generates a complete SOMA experiment from it

Implemented SOMA methods:

  • write_soma.Seurat(): create a SOMA experiment from a Seurat object
  • write_soma.Assay(), write_soma.DimReduc(), write_soma.Graph(): helpers to write Seurat sub-objects to a SOMA experiment
  • write_soma.data.frame(): helper to write a data.frame into a SOMA experiment
  • write_soma.matrix(), write_soma.Matrix(), write_soma.TsparseMatrix(): helpers to write various matrix formats into a SOMA experiment

Future work (tracking in #1192):

  • This PR does not address extended assays (eg. Seurat::SCTAssay, Signac::ChromatinAssay)
  • This PR does not address support for spatially-resolved datasets

resolves #942

@codecov-commenter
Copy link

codecov-commenter commented Mar 20, 2023

Codecov Report

Patch coverage has no change and project coverage change: -10.57 ⚠️

Comparison is base (ff13865) 64.10% compared to head (46b1d54) 53.54%.

❗ Current head 46b1d54 differs from pull request most recent head 861efa7. Consider uploading reports for the commit 861efa7 to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1146       +/-   ##
===========================================
- Coverage   64.10%   53.54%   -10.57%     
===========================================
  Files          93       66       -27     
  Lines        7174     5310     -1864     
===========================================
- Hits         4599     2843     -1756     
+ Misses       2575     2467      -108     
Flag Coverage Δ
python ?
r 53.54% <ø> (+2.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 37 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@mojaveazure mojaveazure force-pushed the ph/feat/from-seurat branch from c890064 to 02273f8 Compare March 22, 2023 21:05
@mojaveazure mojaveazure changed the title [R] [WIP] Add Seurat -> SOMA Ingestor [R] Add Seurat -> SOMA Ingestor Mar 24, 2023
@mojaveazure mojaveazure marked this pull request as ready for review March 25, 2023 00:06
apis/r/R/utils-uris.R Outdated Show resolved Hide resolved
apis/r/R/utils.R Outdated Show resolved Hide resolved
Copy link
Contributor

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a huge step in the right direction.

I have some small niggles on style, give them a minute of though. Not vetoing anything over them though.

(And thanks for adding the logging in the last commit !)

Copy link
Contributor

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The padding step is to go from the "tighter" Seurat / sparse representation to something faithful to SOMA and its shape parameters?

apis/r/R/utils-uris.R Outdated Show resolved Hide resolved
@mojaveazure
Copy link
Member Author

Looks good. The padding step is to go from the "tighter" Seurat / sparse representation to something faithful to SOMA and its shape parameters?

Yes. In Seurat, not all matrices (namely scale.data) have to have the same var whereas SOMA requires each X layer to have the same var and obs; pad_matrix() allows us to expand those matrices to the full domain [var, obs]

Copy link
Contributor

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@pablo-gar pablo-gar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing for write_soma.Seurat()

I created 5 different SOMAExperiement with different datasets from Tabula Muris Senis

And then performed equality checks between the following SOMA parts and their corresponding equivalents from the Seurat object:

All data was created properly into the SOMAExperiement and all equality checks passed!


R code used for this

library("tiledbsoma")
library("Seurat")

stopifnot_df_equal <- function(df1, df2) {
  
  df1 <- as.data.frame(df1)
  df2 <- as.data.frame(df2)
  
  stopifnot(nrow(df1) == nrow(df2))

  shared_cols <- intersect(colnames(df1), colnames(df2))
  
  df1 <- df1[,shared_cols]
  df2 <- df2[,shared_cols]
  
  bool_matrix <- df1 != df2
  bool_matrix[is.na(bool_matrix)] <- F 
  
  stopifnot(sum(colSums(bool_matrix)) == 0) 
  
}

# Get Seurat objects 
seurat_object <- readRDS("local.rds") # file from datasets here https://cellxgene.cziscience.com/collections/0b9d8a04-bb9d-44da-aa27-705bb65b54eb
seurat_obs <- seurat_object@meta.data
seurat_var <- seurat_object@assays$RNA@meta.features
seurat_pca <- seurat_object@reductions$pca@cell.embeddings
seurat_umap <- seurat_object@reductions$umap@cell.embeddings
seurat_tsne <- seurat_object@reductions$tsne@cell.embeddings
seurat_x_counts <- seurat_object@assays$RNA@counts
seurat_x_data <- seurat_object@assays$RNA@data

# Make soma object
unlink("from_seurat_soma", recursive = T)
write_soma(seurat_object, "from_seurat_soma")

# Get soma objects
soma_seurat <- SOMAExperimentOpen("from_seurat_soma/")

soma_obs <- as.data.frame(soma_seurat$get("obs")$read())
soma_var <- as.data.frame(soma_seurat$get("ms")$get("RNA")$get("var")$read())
soma_pca <- soma_seurat$get("ms")$get("RNA")$get("obsm")$get("X_pca")$read_dense_matrix()
soma_umap <- soma_seurat$get("ms")$get("RNA")$get("obsm")$get("X_umap")$read_dense_matrix()
soma_tsne <- soma_seurat$get("ms")$get("RNA")$get("obsm")$get("X_tsne")$read_dense_matrix()
soma_x_counts <- soma_seurat$get("ms")$get("RNA")$get("X")$get("counts")$read_sparse_matrix()
soma_x_data <- soma_seurat$get("ms")$get("RNA")$get("X")$get("data")$read_sparse_matrix()

# Perform equality comparisons
stopifnot_df_equal(seurat_obs, soma_obs)
stopifnot_df_equal(seurat_var, soma_var)
stopifnot(seurat_pca == soma_pca)
stopifnot(seurat_umap == soma_umap)
stopifnot(seurat_tsne == soma_tsne)

for (i in 1:100) {
  stopifnot(seurat_x_counts[,i] == soma_x_counts[i,])
  stopifnot(seurat_x_data[,i] == soma_x_data[i,])
}

@eddelbuettel
Copy link
Contributor

Nice -- regarding #1111 please see #1194 and this idiom it uses (but which is already supported, you should be able to use it now):

## pass with budget of 45mb
ctx <- tiledbsoma::SOMATileDBContext$new(c(soma.init_buffer_bytes="45000000"))
sdf2 <- SOMADataFrameOpen(uri, tiledbsoma_ctx = ctx)
expect_silent(sdf2$read())

where uri is the path to the data set that fails with the smaller default budget.

@aaronwolen aaronwolen requested a review from johnkerl March 30, 2023 14:17
Copy link
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and fills a gaping hole in our functionality. I made a few comments inline but there's a larger issue I think we should address: I'm just realizing we're currently missing R6 methods for adding new dataframes and ndarrays to a SOMACollection as described in the spec (e.g., add_new_dataframe()). This is functionality that you've already implemented in this PR so let's use that code to add these missing R6 methods, which write_soma() and friends can invoke.

apis/r/R/write_soma.R Outdated Show resolved Hide resolved
apis/r/R/write_soma.R Outdated Show resolved Hide resolved
apis/r/R/write_soma.R Outdated Show resolved Hide resolved
apis/r/R/write_soma.R Outdated Show resolved Hide resolved
apis/r/R/write_soma.R Outdated Show resolved Hide resolved
apis/r/R/write_soma.R Outdated Show resolved Hide resolved
apis/r/R/write_soma.R Outdated Show resolved Hide resolved
apis/r/R/write_soma.R Outdated Show resolved Hide resolved
apis/r/R/write_soma.R Outdated Show resolved Hide resolved
Copy link
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and fills a gaping hole in our functionality. I made a few comments inline but there's a larger issue I think we should address: I'm just realizing we're currently missing R6 methods for adding new dataframes and ndarrays to a SOMACollection as described in the spec (e.g., add_new_dataframe()). This is functionality that you've already implemented in this PR so let's use that code to add these missing R6 methods, which write_soma() and friends can invoke.

@mojaveazure mojaveazure force-pushed the ph/feat/from-seurat branch from 6ca7fa0 to 86902a8 Compare April 6, 2023 18:15
@mojaveazure mojaveazure requested a review from aaronwolen April 6, 2023 23:34
@johnkerl johnkerl changed the title [R] Add Seurat -> SOMA Ingestor [r] Add Seurat -> SOMA Ingestor Apr 7, 2023
apis/r/R/write_seurat.R Outdated Show resolved Hide resolved
apis/r/R/write_seurat.R Outdated Show resolved Hide resolved
Copy link
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Last thing can you bump the version and update NEWS?

Bump develop version

[ci-skip]
@mojaveazure mojaveazure merged commit d314176 into main Apr 20, 2023
@mojaveazure mojaveazure deleted the ph/feat/from-seurat branch April 20, 2023 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[r] Implement from_seurat
6 participants