-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpectedly short protein sequences #90
Comments
The second problematic case discovered is due to #55 -- which should be fixed when we switch to interbase coordinates for gathering locus reads. |
Possible fix: include protein sequence length as the 3rd sorting criterion in https://github.com/hammerlab/isovar/blob/master/isovar/protein_sequences.py#L164 |
@scottdbrown -- do you think including protein sequence length as part of the sorting criteria would fix your issue? Or, is it altogether unexpected for one of the returned sequences to be a subsequence of another? |
@scottdbrown -- you redacted the cDNA sequence lengths for the two translation keys but do you mind posting those here? |
Here are the lengths of the redacted sections:
@iskandr -- Yes, I think sorting by length would fix the issue - the protein sequence that was output was the start of the shorter of the two transcripts, which was entirely contained within the longer transcript. The reads covered the region upstream of the shorter transcript (which is part of the longer transcript). |
Can you post the full cDNA seqs? I'm curious why these coding sequences
didn't just get merged.
Thanks!
…On Mon, Nov 27, 2017 at 4:02 PM, Scott Brown ***@***.***> wrote:
Here are the lengths of the redacted sections:
2017-11-27 11:25:43,425 - isovar.variant_sequence_in_reading_frame:105 - INFO - cdna_predix='[REDACTED 36 nts]', cdna_alt='C', cdna_suffix='[REDACTED 35 nts]', reference_prefix='[REDACTED 36 nts]', reference_suffix='[REDACTED 36 nts]', n_trimmed=0
2017-11-27 11:25:43,425 - isovar.variant_sequence_in_reading_frame:354 - INFO - Iter #1/3: VariantSequenceInReadingFrame(cdna_sequence='[REDACTED 72 nts]', offset_to_first_complete_codon=2, variant_cdna_interval_start=36, variant_cdna_interval_end=37, reference_cdna_sequence_before_variant='[REDACTED 36 nts]', reference_cdna_sequence_after_variant='[REDACTED 36 nts]', number_mismatches_before_variant=0, number_mismatches_after_variant=0)
2017-11-27 11:25:43,425 - isovar.variant_sequence_in_reading_frame:105 - INFO - cdna_predix='[REDACTED 36 nts]', cdna_alt='C', cdna_suffix='[REDACTED 35 nts]', reference_prefix='[REDACTED 36 nts]', reference_suffix='[REDACTED 36 nts]', n_trimmed=0
2017-11-27 11:25:43,425 - isovar.variant_sequence_in_reading_frame:354 - INFO - Iter #1/3: VariantSequenceInReadingFrame(cdna_sequence='[REDACTED 72 nts]', offset_to_first_complete_codon=23, variant_cdna_interval_start=36, variant_cdna_interval_end=37, reference_cdna_sequence_before_variant='[REDACTED 36 nts]', reference_cdna_sequence_after_variant='[REDACTED 36 nts]', number_mismatches_before_variant=0, number_mismatches_after_variant=0)
@iskandr <https://github.com/iskandr> -- Yes, I think sorting by length
would fix the issue - the protein sequence that was output was the start of
the shorter of the two transcripts, which was entirely contained within the
longer transcript. The reads covered the region upstream of the shorter
transcript (which is part of the longer transcript).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#90 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAC9OUlzumRb8CjkhGdCC_Sj0c9mrqiDks5s6yN9gaJpZM4QsHa1>
.
|
Sorry, due to privacy concerns with this sample, I'm not able to share any germline sequence. |
Is there any other information I can provide that would help? I apologize for the inconvenience of not being able to provide the actual sequence. |
To get around this issue, I manually created the mutation of interest in a non-protected sequence file, and have attached all relevant files for you to hopefully be able to recreate the issue.
Input and output files can be found here: https://github.com/scottdbrown/isovar_COLO829_test |
Issue reported by email from Scott Brown:
...
STDOUT from isovar invocation:
It does seem like both
AAGAVEWMYPTAALIVNLRPNTF
andMYPTAALIVNLRPNTF
have the same number of reads, maybe there's no logic for when there's a tie in coverage?The text was updated successfully, but these errors were encountered: