Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bam to counts conversion #4

Merged
merged 42 commits into from
Oct 28, 2024
Merged

bam to counts conversion #4

merged 42 commits into from
Oct 28, 2024

Conversation

cansavvy
Copy link
Collaborator

@cansavvy cansavvy commented Oct 24, 2024

Description

This is adding the bam to counts conversion that's done by the custom scripts counter_efficient.Rand combine_counts.py in the original pipeline.

The only thing I find confusing is that the counts I'm getting (after calculating weights) do not turn out as whole numbers. And I'm not sure why they do in the original data. <\del>

Figured it out! See below. It's lining up with the expected counts from the original pipeline configuration. 🎉

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

I haven't developed unit tests for this however the examples documented in the code are working examples and are how I have tested these functions.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@cansavvy
Copy link
Collaborator Author

Screenshot 2024-10-24 at 3 20 56 PM

I figured it out. Here's a plot to prove it. I'm putting proof of concept plots in the inst/extdata/rmd folder here so we have them for our records.

@@ -1,22 +1,14 @@
Type: Package
Package: gimap
Title: Calculate Genetic Interactions for CRISPR Targets
Package: pgmap
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also did some package docs updates

@marissafujimoto
Copy link
Collaborator

@cansavvy Looks great overall! Much cleaner compared to the original pipeline! Ship it.

My biggest thought when reading this (and I think it's out of scope for this PR) is whether we can do this count without loading the bams into memory. Instead maybe iteratively building up the counts and using O(# paired guides) memory by streaming the bam files.

Another thought I had was whether the bam / alignment step is necessary given that mageck argues that you should go directly from fastq and not count partial alignments, though I'm not sure if this should be different for paired guides or how much of our example data is perfectly aligned: https://sourceforge.net/p/mageck/wiki/advanced_tutorial/#tutorial-1-allow-mismatches-for-read-mapping

@cansavvy
Copy link
Collaborator Author

cansavvy commented Oct 25, 2024

@cansavvy Looks great overall! Much cleaner compared to the original pipeline! Ship it.

I'll ship it after I make the tests pass. 👍

My biggest thought when reading this (and I think it's out of scope for this PR) is whether we can do this count without loading the bams into memory. Instead maybe iteratively building up the counts and using O(# paired guides) memory by streaming the bam files.

I ran this locally without a problem (took maybe 30 sec, 10 sec a sample) but certainly saving compute time and memory is good. And we should determine how big are the biggest files these assays might have to think more specifically about bench marking.

Another thought I had was whether the bam / alignment step is necessary given that mageck argues that you should go directly from fastq and not count partial alignments, though I'm not sure if this should be different for paired guides or how much of our example data is perfectly aligned: https://sourceforge.net/p/mageck/wiki/advanced_tutorial/#tutorial-1-allow-mismatches-for-read-mapping

Interesting thanks for sharing! I'll dig into this tomorrow!

@cansavvy
Copy link
Collaborator Author

Still some troubles with the Rsamtools dependency but I'll move on

@cansavvy cansavvy merged commit 3892ba1 into main Oct 28, 2024
3 of 11 checks passed
@cansavvy cansavvy deleted the cansavvy/bam_to_counts branch October 28, 2024 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants