-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put non-count data in CDS objects #7
Comments
I am actually guessing that this is because of NAs in your expression matrix rather than a slot issue. Can you check that sum(is.na(exprs(npcs_cds)) is equal to zero? |
Thanks for the quick reply. I just checked and it is zero. I've run quite a bit of analysis on this data already in scanpy, so I'm pretty sure it's not a data issue. Of course something can have happened during the conversion to R via I've previously tried using pre-processed data with monocle and ran into the same issues unfortunately. Have you run garnett without doing the pre-processing in a |
Hi again. I've gotten garnett to work with count data input. But I was wondering if you could let me know how best to use pre-processed data. I have a strong batch effect in my data and I'm pretty sure the classifier would be better trained without this effect in the data. Thanks in advance for your help |
I have found some instructions on how to load pre-processed data into a
However, when I set
Is this intensional? Is garnett currently not compatible with Here is the code I'm running:
Kind regards |
You should still run estimateSizeFactors which I think should still work. I will note that Garnett has only been tested with count data... it has some internal normalization steps to help it be compatible across datasets that may hurt performance with a prenormalized dataset depending on how it was normalized. I would expect this mostly to be an issue when using pre-trained classifiers or when classifying other count data. But again, it’s gonna depend on your preprocessing steps I think. Hopefully it will at least run once you have size factors (check that they’re not NAs - you can set NAs to 1 if you’re confident depth across cells is consistent). Would be curious to hear how it works |
Thanks for the response! If I run So far the classification unfortunately hasn't worked particularly well for my data with several different iterations of marker gene sets. One reason is probably due to a lack of pre-processing (apart from cell/gene QC and normalization via |
I think it would be fine to set the size factors to 1, yes they are stored at pData(cds)$Size_Factor, so you can just set pData(cd)$Size_Factor <- 1 Curious if this helps your classification, let me know! |
I just tried what you suggested and ran
|
Hi,
I'm afraid this is more of a question than a bug report. Thus, I am deviating from the template.
I haven't been able to find out how to load pre-processed data into
cds
objects without getting errors in downstream analysis.I have patient data integrated with MNN, on which I would like to run garnett. I have run this pre-processing in scanpy, and so am transferring data from python into R to use garnett. Following the monocle and garnett tutorials, I generate the CDS object with:
Where
obs_mon
, andvar_mon
are pandas dataframes converted to R dataframes, anddata_man_mon
is a numpy array converted to an R dataframe.As the data is pre-processed I immediately load my marker gene file and check the markers, and then get the following error:
Error: Must run estimateSizeFactors() on cds before calling check_markers
After running
npcs_cds <- estimateSizeFactors(npcs_cds)
, and I get the following error:I assume these errors have to do with certain slots in the
cds
object not being filled (e.g., reduced dimensions, size factors, dispersions, etc.).It would be great if you could tell me which slots are missing and how I can assign data to these slots. As I have processed the data in scanpy, I should be able to easily transfer this across.
The text was updated successfully, but these errors were encountered: