Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: Polishing Ideas for QC code #17

Merged
merged 6 commits into from
Mar 22, 2024
Merged

DRAFT: Polishing Ideas for QC code #17

merged 6 commits into from
Mar 22, 2024

Conversation

cansavvy
Copy link
Collaborator

@cansavvy cansavvy commented Mar 8, 2024

Description

This is building on the excellent work that @kweav started over on #16 Here I'm working on simplifying that code so that when the Berger Lab is in the maintenance phase of this project they will be able to find bugs and squash them as needed.

Type of change

What I've done is try to simplify the code. Now this is having the gimap_dataset sent directly to the Rmd report for processing.

The qc plot functions are almost completely unchanged. I'm just changing how the handling of the data is working so that we have as few objects around as possible to troubleshoot.

  • This change requires a documentation update

How Has This Been Tested?

I ran this locally and it worked -- what I need @kweav to do is to make sure the report still has all the same components she originally intended.

Checklist:

  • My code follows the style guidelines of this project -- I ran it through styler package.
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@cansavvy cansavvy changed the base branch from main to ki_qc March 8, 2024 15:02
@cansavvy cansavvy changed the title Cansavvy/refactor Polishing Ideas for QC code Mar 8, 2024
@cansavvy cansavvy marked this pull request as draft March 8, 2024 15:06
@cansavvy cansavvy changed the title Polishing Ideas for QC code DRAFT: Polishing Ideas for QC code Mar 8, 2024
@cansavvy
Copy link
Collaborator Author

cansavvy commented Mar 8, 2024

This still requires some more checks and polishing, but I'm happy to work with @kweav on this.

@cansavvy cansavvy requested a review from kweav March 12, 2024 12:31
@cansavvy cansavvy marked this pull request as ready for review March 18, 2024 19:05
Copy link
Collaborator Author

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kweav This is really great -- please note that a lot of the code is untouched by me -- I ran this through the styler R package so its automatically styled. The main thing I did here was suggest that we can send the data to the Rmd report directly. The plotting functions look fantastic.

Before we can really use this function I'd like to hear from you on whether you think it still addresses the spirit of your original PR -- are the plots and everything still getting us what we need?

Thanks again for taking the lead on this!

if (!("gimap_dataset" %in% class(gimap_dataset))) stop("This function only works with gimap_dataset objects which can be made with the setup_data() function.")

# for Rmd report
list_of_qc_things <- list(counts_cdf = NULL,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this totally made sense as a first attempt! It makes sense that you may have been following the structure in the setup_data.R code where I had written a structured list where data was set up.

Is there a reason you thought we needed to follow that same setup here instead of going to the Rmd directly?

Again, thanks so much for getting this first setup ready. Your digging in on the biology was really superb.

Copy link
Collaborator

@kweav kweav Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the Rmarkdown/report was a secondary goal and would be an option that people could choose whether or not to render. I didn't plan for the overall list_of_qc_thing to get returned and passed on to the next function (of finding log fold change), rather only returning the filter that I saved within the gimap dataset for the next function. This list instead served as a way to save ggplot objects and strings (of where plots were saved) or values of certain parameters for the report and pass it to a separate generate report function. That way, all the info was stored as it was used/produced in real time of running QC, and if someone chose to make the report, it was just filling in info rather than rerunning things.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want the report to be optional, it makes sense to do everything in the rmarkdown as long as the end filter is available for the next step.

@@ -40,7 +42,7 @@ setup_data <- function(counts = NULL, pg_ids = NULL, pg_metadata = NULL, sample_
# If they don't give sample metadata, then we will make up a row id
if (is.null(sample_metadata)) sample_metadata <- data.frame(id = 1:nrow(counts))
if (!is.data.frame(sample_metadata)) stop("metadata can only be in the form of a data.frame")
if (nrow(sample_metadata) != ncol(counts)) stop("the number of rows in the sample metadata is not equal to the number of columns in the counts" )
if (nrow(sample_metadata) != ncol(counts)) stop("the number of rows in the sample metadata is not equal to the number of columns in the counts")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also ran this through styler R package -- so you will see a lot of little things like spacing pop up -- not me going through manually to do this, but you can feel free to use styler on your own if you want (I am too lazy to style on my own 😄 )

params = list(cell_line = NULL), plots_dir = "./qc_plots", overwrite = FALSE, output_format = NULL, output_file = "QC_Report", output_dir = "./",
wide_ar = 0.75) {

store_results <- function(result_obj, lvoi, list_of_qc_things, plots_dir, params){
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need to store the results, which is the strategy I like, then we can remove this function -- one less thing to maintain. But let me know your thoughts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed I only added this function because I was storing the plots as well as the locations where the plots were stored and was trying to reduce receptiveness.

values_to = "count_normalized")

counts_cdf <- ggplot(long_form, aes(x=count_normalized, color = sample)) +
everything(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all just from styler again.

rep = as.factor(c("RepA", "RepA", "RepA", "RepB", "RepC"))
)
```

We'll need to provide `example_counts`, `example_pg_id` and `examplemetadata` to `setup_data()`. We can provide `sample_metadata`, but it is not required at the moment. (but it likely will be???)
We'll need to provide `example_counts`, `example_pg_id` and `example_pg_metadata` to `setup_data()`. We can provide `sample_metadata`, but it is not required at the moment. (but it likely will be???)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spacing makes it easier to read is all. Very minor suggestion.

kweav and others added 2 commits March 18, 2024 15:30
remove repeated `count_norm` and unused `long_form`
@cansavvy
Copy link
Collaborator Author

Toward the goal of moving this forward, I'm going to merge this so we can take a deeper look at the changes on #16

@cansavvy cansavvy merged commit c734375 into ki_qc Mar 22, 2024
@cansavvy cansavvy deleted the cansavvy/refactor branch March 22, 2024 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants