Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isoseq collapse : collapsing extra 5p exons results in incorrect isoforms. #717

Open
Fougere87 opened this issue Sep 17, 2024 · 2 comments
Labels

Comments

@Fougere87
Copy link

Operating system
UNIX

Package name
isoseq

Describe the bug
We performed single cell RNA sequencing using 10x 5p captures, Mas Isoseq library preparation and sequencing on Revio system. Everything is working fine, but for some genes I have a dominant isoform that has one extra exon that does not exist. This extra exon is included in the isoform even if it is present in only one read (over more that 15000).
After filtering this isoform is discarded (considered as RTS) and the counts of the gene are near 0...
Maybe adding a threshold including a minimal coverage to keep or not an extra exon and collapse it to FSM isoform would be helpful.

@armintoepfer
Copy link
Member

How long is that extra exon? There are thresholds how much "extra" we accept on each flank during clustering. Since clustering is reference-free, isoseq has no notion of exons, but only partial matches.

@Fougere87
Copy link
Author

Hi Armin,

Thank you for getting in touch and sorry for the late answer :-) I was in a congress this beginning of the week and quite busy.

The extra-exon is 126nt long and in 5p position. I checked and there is only one molecule/read that have this exon/intron sequence (which is a full splice match for Wt1 transcript) and this extra exon with all same boundaries.

But all molecules that don't have this extra exon and that have canonical exon introns were associated to this isoform.

I counted in my dataset, I have >106k isoforms with more exons than reference accounting for around 12.5% of our deduplicated reads. I have not calculated yet which ones had extra 5p exon though, so it might be lot fewer.

I wonder how could we check for all clusters that there is kind of an uniformity in the coverage of the isof. To ensure that just one or two reads (compared to thousands that are canonical) with one extra exon do not create a new artifactual and undescribed isoform.

#There are thresholds how much "extra" we accept on each flank during clustering -> Do you mean exons or nt ?

All the best,

Chloé MAYERE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants