bam to counts conversion #4

cansavvy · 2024-10-24T18:49:04Z

Description

This is adding the bam to counts conversion that's done by the custom scripts counter_efficient.Rand combine_counts.py in the original pipeline.

~~The only thing I find confusing is that the counts I'm getting (after calculating weights) do not turn out as whole numbers. And I'm not sure why they do in the original data. <\del>~~

Figured it out! See below. It's lining up with the expected counts from the original pipeline configuration. 🎉

Type of change

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

I haven't developed unit tests for this however the examples documented in the code are working examples and are how I have tested these functions.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

cansavvy · 2024-10-24T19:22:15Z

I figured it out. Here's a plot to prove it. I'm putting proof of concept plots in the inst/extdata/rmd folder here so we have them for our records.

cansavvy · 2024-10-24T19:22:33Z

DESCRIPTION

@@ -1,22 +1,14 @@
 Type: Package
-Package: gimap
-Title: Calculate Genetic Interactions for CRISPR Targets
+Package: pgmap


Also did some package docs updates

marissafujimoto · 2024-10-24T23:41:06Z

@cansavvy Looks great overall! Much cleaner compared to the original pipeline! Ship it.

My biggest thought when reading this (and I think it's out of scope for this PR) is whether we can do this count without loading the bams into memory. Instead maybe iteratively building up the counts and using O(# paired guides) memory by streaming the bam files.

Another thought I had was whether the bam / alignment step is necessary given that mageck argues that you should go directly from fastq and not count partial alignments, though I'm not sure if this should be different for paired guides or how much of our example data is perfectly aligned: https://sourceforge.net/p/mageck/wiki/advanced_tutorial/#tutorial-1-allow-mismatches-for-read-mapping

cansavvy · 2024-10-25T01:37:13Z

@cansavvy Looks great overall! Much cleaner compared to the original pipeline! Ship it.

I'll ship it after I make the tests pass. 👍

My biggest thought when reading this (and I think it's out of scope for this PR) is whether we can do this count without loading the bams into memory. Instead maybe iteratively building up the counts and using O(# paired guides) memory by streaming the bam files.

I ran this locally without a problem (took maybe 30 sec, 10 sec a sample) but certainly saving compute time and memory is good. And we should determine how big are the biggest files these assays might have to think more specifically about bench marking.

Another thought I had was whether the bam / alignment step is necessary given that mageck argues that you should go directly from fastq and not count partial alignments, though I'm not sure if this should be different for paired guides or how much of our example data is perfectly aligned: https://sourceforge.net/p/mageck/wiki/advanced_tutorial/#tutorial-1-allow-mismatches-for-read-mapping

Interesting thanks for sharing! I'll dig into this tomorrow!

cansavvy · 2024-10-28T18:10:21Z

Still some troubles with the Rsamtools dependency but I'll move on

cansavvy added 5 commits October 24, 2024 14:44

add changes

7a6a06b

document

53bf35b

fix example

65357d2

Some fixes include paired filtered out

07880e1

Proof of concept validation Rmd

ae84a3e

cansavvy commented Oct 24, 2024

View reviewed changes

cansavvy requested a review from marissafujimoto October 24, 2024 19:26

Add a basic unit test

e00e752

cansavvy added 19 commits October 25, 2024 09:52

Add timer

4d23d76

Missing comma

9b310f2

Add missing dependencies

c3fadc6

rearrange anypaired steps

34a7e61

Update dependencies for real

6dcb773

missing a comma

329e279

Install from github maybe?

acb5856

update remotes spec

3fe4eff

Add Rsamtools

9bdc6aa

Add biocmanager

d3a3101

3.15

3e33b4b

Delete old files

4b5a5f7

update some handling

3fc068e

Updates

0e77040

updates

6c0b887

Updates

e786e53

Remotes

70737cc

Get rid of "--as_cran"

b27495b

Install tidyr

9de88b1

cansavvy added 17 commits October 28, 2024 10:58

It doesn't like the zip thing

625e95b

set mirror

9d5e52e

Fix Rsamtools

b5535aa

Try this?

9df2d4d

install tidyr

25c6ca9

Fix install step

42760b1

Try this instead

9f18004

Specify cran mirror

9495f92

Try to get Ubuntu tests working

92310f9

Rhtslib

9de05ae

Try this

f1b5041

3.15

c1e861d

Add devtools too

cff0937

alter gha

ee81b30

Fix spacing

7c37407

Update

64b9403

Add bioc friendly GHA

0a64475

cansavvy merged commit 3892ba1 into main Oct 28, 2024
3 of 11 checks passed

cansavvy deleted the cansavvy/bam_to_counts branch October 28, 2024 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bam to counts conversion #4

bam to counts conversion #4

cansavvy commented Oct 24, 2024 •

edited

Loading

cansavvy commented Oct 24, 2024

cansavvy Oct 24, 2024

marissafujimoto commented Oct 24, 2024

cansavvy commented Oct 25, 2024 •

edited

Loading

cansavvy commented Oct 28, 2024

bam to counts conversion #4

bam to counts conversion #4

Conversation

cansavvy commented Oct 24, 2024 • edited Loading

Description

Type of change

How Has This Been Tested?

Checklist:

cansavvy commented Oct 24, 2024

cansavvy Oct 24, 2024

Choose a reason for hiding this comment

marissafujimoto commented Oct 24, 2024

cansavvy commented Oct 25, 2024 • edited Loading

cansavvy commented Oct 28, 2024

cansavvy commented Oct 24, 2024 •

edited

Loading

cansavvy commented Oct 25, 2024 •

edited

Loading