-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bam to counts conversion #4
Conversation
@@ -1,22 +1,14 @@ | |||
Type: Package | |||
Package: gimap | |||
Title: Calculate Genetic Interactions for CRISPR Targets | |||
Package: pgmap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also did some package docs updates
@cansavvy Looks great overall! Much cleaner compared to the original pipeline! Ship it. My biggest thought when reading this (and I think it's out of scope for this PR) is whether we can do this count without loading the bams into memory. Instead maybe iteratively building up the counts and using O(# paired guides) memory by streaming the bam files. Another thought I had was whether the bam / alignment step is necessary given that mageck argues that you should go directly from fastq and not count partial alignments, though I'm not sure if this should be different for paired guides or how much of our example data is perfectly aligned: https://sourceforge.net/p/mageck/wiki/advanced_tutorial/#tutorial-1-allow-mismatches-for-read-mapping |
I'll ship it after I make the tests pass. 👍
I ran this locally without a problem (took maybe 30 sec, 10 sec a sample) but certainly saving compute time and memory is good. And we should determine how big are the biggest files these assays might have to think more specifically about bench marking.
Interesting thanks for sharing! I'll dig into this tomorrow! |
Still some troubles with the Rsamtools dependency but I'll move on |
Description
This is adding the bam to counts conversion that's done by the custom scripts counter_efficient.Rand combine_counts.py in the original pipeline.
The only thing I find confusing is that the counts I'm getting (after calculating weights) do not turn out as whole numbers. And I'm not sure why they do in the original data. <\del>Figured it out! See below. It's lining up with the expected counts from the original pipeline configuration. 🎉
Type of change
How Has This Been Tested?
I haven't developed unit tests for this however the examples documented in the code are working examples and are how I have tested these functions.
Checklist: