Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimenting with CRISPR calculations #77

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

cansavvy
Copy link
Collaborator

Description

From a basecamp conversation we realized normalization might not be happening as we think.

@ahberger thought we were calculating CRISPRs using:

logFC adjusted = (log2FC - log2FC_negctls) / |log2FC_posctls|

But the original code has this as the calculation:

https://github.com/FredHutch/GI_mapping/blob/e117710977fd4c92b62ff3f552254a6a3076a6d4/workflow/scripts/03-filter_and_calculate_LFC.Rmd#L450

d.lfc_annot_adj <- d.lfc_annot %>%
  group_by(rep) %>%
  mutate(lfc_adj1 = lfc_plasmid_vs_late - median(lfc_plasmid_vs_late[norm_ctrl_flag == "negative_control"]),
         lfc_adj2 = lfc_adj1 / (median(lfc_adj1[norm_ctrl_flag == "negative_control"]) -
                                  median(lfc_adj1[norm_ctrl_flag == "positive_control"]))) 
...

And then one more median subtraction later.

...
  group_by(rep) %>%
  mutate(lfc_adj3 = lfc_adj2 - median(lfc_adj2[unexpressed_ctrl_flag == TRUE]))

And this is what we've been basing CRISPR calculations on and have gotten very similar results to what is in the results folder on the cluster grp/bergerlab_shared/Projects/paralog_pgRNA/pgPEN_library/GI_mapping/results

But when I plot the results found here (which by all indicators: https://github.com/FredHutch/GI_mapping/blob/e117710977fd4c92b62ff3f552254a6a3076a6d4/workflow/scripts/03-filter_and_calculate_LFC.Rmd#L8 ) are from the code we have.

When I plot these data it doesn't adhere to the negative controls = 0 and positive controls = -1 as expected:
norm_plot

The code on this branch then, attempts to try to better meet these expectations by calculating CRISPR using the following:

logFC adjusted = (log2FC - log2FC_negctls) / |log2FC_posctls|

Instead of the original code. This results
crispr_scores

Note however this version of the code does not result in the perfect -1 for positive controls:

  rep              norm_ctrl_flag   median_crispr
   <chr>            <fct>                    <dbl>
 1 Day05_RepA_early negative_control         0    
 2 Day05_RepA_early positive_control         3.25 
 3 Day05_RepA_early single_targeting         2.86 
 4 Day05_RepA_early double_targeting         4.47 
 5 Day22_RepA_late  negative_control         0    
 6 Day22_RepA_late  positive_control        -2.18 
 7 Day22_RepA_late  single_targeting        -0.826
 8 Day22_RepA_late  double_targeting        -1.86 
 9 Day22_RepB_late  negative_control         0    
10 Day22_RepB_late  positive_control        -2.07 
11 Day22_RepB_late  single_targeting        -0.793
12 Day22_RepB_late  double_targeting        -1.66 
13 Day22_RepC_late  negative_control         0    
14 Day22_RepC_late  positive_control        -2.13 
15 Day22_RepC_late  single_targeting        -0.785
16 Day22_RepC_late  double_targeting        -1.75 

@cansavvy
Copy link
Collaborator Author

cansavvy commented Dec 20, 2024

Overall readability score: 44.82 (🟢 +0.12)

File Readability
README.md 60.48 (🟢 +0.47)
View detailed metrics

🟢 - Shows an increase in readability
🔴 - Shows a decrease in readability

File Readability FRE GF ARI CLI DCRS
README.md 60.48 50.57 10.65 13.3 11.66 6.39
  🟢 +0.47 🟢 +0.31 🟢 +0.12 🟢 +0.1 🟢 +0 🟢 +0.02

Averages:

  Readability FRE GF ARI CLI DCRS
Average 44.82 34.48 11.87 14.18 14.21 8.27
  🟢 +0.12 🟢 +0.08 🟢 +0.03 🟢 +0.02 🟢 +0 🟢 +0
View metric targets
Metric Range Ideal score
Flesch Reading Ease 100 (very easy read) to 0 (extremely difficult read) 60
Gunning Fog 6 (very easy read) to 17 (extremely difficult read) 8 or less
Auto. Read. Index 6 (very easy read) to 14 (extremely difficult read) 8 or less
Coleman Liau Index 6 (very easy read) to 17 (extremely difficult read) 8 or less
Dale-Chall Readability 4.9 (very easy read) to 9.9 (extremely difficult read) 6.9 or less

@cansavvy
Copy link
Collaborator Author

Following an older version of the code I did:

crispr_score = (lfc - negative_control) / ( negative_control - positive_control)

And now negative controls are 0 and positive controls are -1 as expected. Will interrogate this more later but I think we're more on track. Also have a function to do the plotting and will add this as a part of unit testing.

With the new calculations we are getting closer. It doesn't look like the paper but at least our normalization is actually to the right range now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant