-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: Polishing Ideas for QC code #17
Conversation
This still requires some more checks and polishing, but I'm happy to work with @kweav on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kweav This is really great -- please note that a lot of the code is untouched by me -- I ran this through the styler R package so its automatically styled. The main thing I did here was suggest that we can send the data to the Rmd report directly. The plotting functions look fantastic.
Before we can really use this function I'd like to hear from you on whether you think it still addresses the spirit of your original PR -- are the plots and everything still getting us what we need?
Thanks again for taking the lead on this!
if (!("gimap_dataset" %in% class(gimap_dataset))) stop("This function only works with gimap_dataset objects which can be made with the setup_data() function.") | ||
|
||
# for Rmd report | ||
list_of_qc_things <- list(counts_cdf = NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this totally made sense as a first attempt! It makes sense that you may have been following the structure in the setup_data.R code where I had written a structured list where data was set up.
Is there a reason you thought we needed to follow that same setup here instead of going to the Rmd directly?
Again, thanks so much for getting this first setup ready. Your digging in on the biology was really superb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that the Rmarkdown/report was a secondary goal and would be an option that people could choose whether or not to render. I didn't plan for the overall list_of_qc_thing
to get returned and passed on to the next function (of finding log fold change), rather only returning the filter that I saved within the gimap dataset for the next function. This list instead served as a way to save ggplot objects and strings (of where plots were saved) or values of certain parameters for the report and pass it to a separate generate report function. That way, all the info was stored as it was used/produced in real time of running QC, and if someone chose to make the report, it was just filling in info rather than rerunning things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't want the report to be optional, it makes sense to do everything in the rmarkdown as long as the end filter is available for the next step.
@@ -40,7 +42,7 @@ setup_data <- function(counts = NULL, pg_ids = NULL, pg_metadata = NULL, sample_ | |||
# If they don't give sample metadata, then we will make up a row id | |||
if (is.null(sample_metadata)) sample_metadata <- data.frame(id = 1:nrow(counts)) | |||
if (!is.data.frame(sample_metadata)) stop("metadata can only be in the form of a data.frame") | |||
if (nrow(sample_metadata) != ncol(counts)) stop("the number of rows in the sample metadata is not equal to the number of columns in the counts" ) | |||
if (nrow(sample_metadata) != ncol(counts)) stop("the number of rows in the sample metadata is not equal to the number of columns in the counts") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also ran this through styler
R package -- so you will see a lot of little things like spacing pop up -- not me going through manually to do this, but you can feel free to use styler on your own if you want (I am too lazy to style on my own 😄 )
params = list(cell_line = NULL), plots_dir = "./qc_plots", overwrite = FALSE, output_format = NULL, output_file = "QC_Report", output_dir = "./", | ||
wide_ar = 0.75) { | ||
|
||
store_results <- function(result_obj, lvoi, list_of_qc_things, plots_dir, params){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't need to store the results, which is the strategy I like, then we can remove this function -- one less thing to maintain. But let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed I only added this function because I was storing the plots as well as the locations where the plots were stored and was trying to reduce receptiveness.
values_to = "count_normalized") | ||
|
||
counts_cdf <- ggplot(long_form, aes(x=count_normalized, color = sample)) + | ||
everything(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all just from styler again.
rep = as.factor(c("RepA", "RepA", "RepA", "RepB", "RepC")) | ||
) | ||
``` | ||
|
||
We'll need to provide `example_counts`, `example_pg_id` and `examplemetadata` to `setup_data()`. We can provide `sample_metadata`, but it is not required at the moment. (but it likely will be???) | ||
We'll need to provide `example_counts`, `example_pg_id` and `example_pg_metadata` to `setup_data()`. We can provide `sample_metadata`, but it is not required at the moment. (but it likely will be???) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spacing makes it easier to read is all. Very minor suggestion.
remove repeated `count_norm` and unused `long_form`
Toward the goal of moving this forward, I'm going to merge this so we can take a deeper look at the changes on #16 |
Description
This is building on the excellent work that @kweav started over on #16 Here I'm working on simplifying that code so that when the Berger Lab is in the maintenance phase of this project they will be able to find bugs and squash them as needed.
Type of change
What I've done is try to simplify the code. Now this is having the gimap_dataset sent directly to the Rmd report for processing.
The qc plot functions are almost completely unchanged. I'm just changing how the handling of the data is working so that we have as few objects around as possible to troubleshoot.
How Has This Been Tested?
I ran this locally and it worked -- what I need @kweav to do is to make sure the report still has all the same components she originally intended.
Checklist: