Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pangolin v4.2 stuck on "Using UShER as inference engine." #500

Closed
VinceLiAB opened this issue Jan 12, 2023 · 12 comments
Closed

Pangolin v4.2 stuck on "Using UShER as inference engine." #500

VinceLiAB opened this issue Jan 12, 2023 · 12 comments

Comments

@VinceLiAB
Copy link

Hi,

Pangolin v4.2 analysis seems to be stuck on the step "Using UShER as inference engine.". I have tried analyzing different sets of data ranging from 20 to 90 samples and they all stop at the same step.

No error messages are given and I didn't encounter this issue prior to the update.

Thank you.

@AngieHinrichs
Copy link
Member

Sorry to hear that @VinceLiAB. If you run 'usher --version' what is the output?

@VinceLiAB
Copy link
Author

The usher version was definitely the culprit. I updated to v0.6.1 from v0.6.0 and it is working again. Thanks for the quick response!

@AngieHinrichs
Copy link
Member

Great, glad it's working for you now!

@wm75
Copy link
Contributor

wm75 commented Jan 13, 2023

@AngieHinrichs v0.6.1 still seems to have a problem with small test input.
Simply running pangolin pangolin/data/reference.fasta causes usher-sampled to hang.

This is why the tests for the bioconda recipe update are failing.

@wm75
Copy link
Contributor

wm75 commented Jan 13, 2023

@AngieHinrichs
Copy link
Member

AngieHinrichs commented Jan 13, 2023

Oof, thanks @wm75! I tested with tests/test-data/sequence1.fasta which has a single sequence... but I did not test with reference.fasta! -- which leads to a VCF file with no data lines (no mutations), which might be triggering some corner case in usher-sampled. @yceh can you please take a look? Here is the header-only VCF file that is causing usher-sampled to hang:

##fileformat=VCFv4.2
##reference=/data/tmp/tmp1tcryjgq/sequences.withref.fa:outgroup_A
##source=faToVcf /data/tmp/tmp1tcryjgq/sequences.withref.fa /data/tmp/tmp1tcryjgq/sequences.aln.vcf
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT  5a7f5aa9677f248abcb2bedf90d7f3e2

@wm75
Copy link
Contributor

wm75 commented Jan 13, 2023

Ah, that makes a lot of sense, thanks! Unfortunately, the bioconda test cannot use the test-data sequence because that's not getting installed.
We could introduce a SNP at runtime though for now if that's all that's needed to make the test work, then revert the patch when you have an usher fix.

@wm75
Copy link
Contributor

wm75 commented Jan 13, 2023

We could introduce a SNP at runtime though for now if that's all that's needed to make the test work, then revert the patch when you have an usher fix.

Yes, it is sufficient to change just a single base in the sequence before using it as a test input!

@AngieHinrichs
Copy link
Member

AngieHinrichs commented Jan 13, 2023

Ah, nice idea with the patch! Something like this should work:

sed -e 's/ACATGGTTTAGTCAGCGTGG/ACATGGTTTAGCCAGCGTGG/' pangolin/data/reference.fasta > $tempFasta

[Edit: NM I see you found your own 😁]

@AngieHinrichs
Copy link
Member

@yceh and @yatisht have already fixed it and released usher v0.6.2: https://github.com/yatisht/usher/releases/tag/v0.6.2

@wm75
Copy link
Contributor

wm75 commented Jan 16, 2023

Thanks @AngieHinrichs @yceh @yatisht!
The bioconda packages for usher 0.6.2 and for pangolin 4.2 using 0.6.2 of usher are now available.

pangolin 4.2 will also appear on usegalaxy.eu later today, together with pangolin 4.1.3 pinned to the same core dependencies, i.e. both Galaxy tool versions will use:

  • scorpio (Version 0.3.17)
  • pangolin-data (Version 1.17)
  • constellations (Version 0.1.10)
  • usher (Version 0.6.2)
  • gofasta (Version 1.1.0)
  • ucsc-fatovcf (Version 426)
  • minimap2 (Version 2.24)

This way comparisons between usher and usher-sampled should be relatively simple.

@AngieHinrichs
Copy link
Member

Great, thanks so much @wm75!

comparisons between usher and usher-sampled

Just for the record, results should be overall very consistent but not identical, especially when sequences have Ns in lineage-defining positions. usher may place a sequence on a node that starts a lineage even if it has only Ns at the defining mutations (the mutations on the node that starts the lineage), but usher-sampled doesn't match all-Ns on the node at the end of the path -- it places it on the parent of that node, so in cases like that the sample will be assigned the parental lineage by usher-sampled. Also, usher would find some redundant equally parsimonious placements (EPPs) while usher-sampled is more stringent, so in cases where multiple EPPs would cause different assignments and pangolin takes a vote, the outcomes can be different. [Next on my list: get rid of the voting; with amplicon dropout issues it's looking like a bad idea now, see #492.]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants