-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substitutions and N's outside quantification window counted in alleles frequency table #356
Comments
Hi @GreenSeaBug, Thanks for using CRISPResso, and sorry about the confusion with the allele display of N's and substitutions. The allele plot will show substitutions and N's outside of the quantification window as different alleles, but they won't make the corresponding reads count as 'modified' or 'edited'. The allele plot is only for visualization, and if we were to collapse substitutions or N's to only show a single unedited allele it would not be an accurate representation of the data. If you open the text file associated with the allele plot (e.g. If you'd like, you can annotate all the unmodified alleles using the command If you still think there is a problem, could you upload the allele table and provide the command you used to run CRISPResso, as well as the alleles you believe are problematic? |
Thank you for the reply. That all makes sense. However, it seems the data in the .txt file do not match what is shown in the alleles visualisation plot. For example in the table below it says that 83.68% are edited with -2, but in the visualisation plot it shows 88.86% are unedited (perfectly match reference in the quantification window). Does this seem strange or am I missing something obvious? Also, is there any way to exclude from the analysis reads with substitutions or N's outside the quantification window? |
The mismatch of numbers (88.86% vs 83.86%) is because alleles with the same visual sequence have been collapsed to a single allele for plotting. That is, there are 88.86% of reads with the sequence shown in the plot, but the alleles couldn't be collapsed in the table because they have differences (snps or N's) that are outside of the plotting window. For example imagine a sample with the reads in the allele frequency table:
If the plotting window were the 2nd to 5th bases, the first two alleles would be collapsed so the alleles plotted would be:
I'm not sure what you mean to exclude the reads with substitutions or Ns. Do you mean that they would be collapsed in the allele plots so the N or substitution would visually be replaced by a base in the reference sequence? If so, I'd be wary of doing that because it doesn't represent the underlying data. If you want reads with substitutions or N's to not make the read 'Modified' you can use the flag |
OK, that makes sense and explains the discrepancy in percentages. However, I don't think it explains why the table says those 83.86 are edited with -2 bp deletion, while the plot says those 88.86% are unedited WT. What do you think? As for excluding reads with substitutions or N's, no I am not wanting to collapse those reads in the allele plot to visually replace the substitutions and N's. I agree that would not be a good idea. Nor am I wanting to prevent reads with substitutions or N's within the quantification window from being classified as modified. Rather, I am wondering if it is possible to exclude these reads from the analysis entirely, that is, filter them out. In my case, and I would think in a lot of cases, they are just sequencing errors or reads derived from chimeric amplicons that are an artefact of PCR. |
I assumed the plot showing 88% unedited was away from your quantification window - is that not the case? You can exclude reads with N by filtering them out before CRISPResso analysis, and passing CRISPResso your filtered reads. Here's a script to filter reads based on the presence of a specific sequence: filterReadsOnSequencePresence.py - try running with |
Yes, the part of the plot that I showed is away from the quantification window. Here is the quantification window... As you can see, nothing is modified. So I don't understand why the table says almost everything is modified (mostly -2 bp deletion). Thank you for the script for the N's! Is there a way to also exclude reads with substitutions outside the quantification window? |
Is the plot above the entire quantification window? If you look at the entire quantification window you should be able to see the 2bp deletion. If you'd prefer not to post here you can email me at [email protected]. For filtering, if you run with '--write_detailed_allele_table' CRISPResso will add a column to the 'Alleles_frequency_table.zip' file for "all_substitution_positions". You can filter for only alleles where this column is empty ("[]") |
The plot above includes more than the entire quantification window. I have -w set to the default, 1. So the quantification window is 2 bp. As you can see there are no edits either side of the quantification window centre. There is no -2 bp deletion. So it seems like a complete mismatch with the allele frequency table .txt file. OK thank you for the tip on substitutions. |
Hello,
I have -w 1 and -wc -3, yet substitutions and N's outside the quantification window seem to be counted as edited in the allele frequency table. For example, in the image below, none of the sequences have indels within the 2 bp quantification window, and the substitutions / N's are at least 19 bp away from the cut site...
What is going on here? Is there a way to exclude those reads from the analysis?
Everything else in the analysis worked as expected.
Thanks for your help.
The text was updated successfully, but these errors were encountered: