Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent estimations in heterozygous plant genomes #8

Open
dcopetti opened this issue Jan 24, 2022 · 2 comments
Open

inconsistent estimations in heterozygous plant genomes #8

dcopetti opened this issue Jan 24, 2022 · 2 comments

Comments

@dcopetti
Copy link

Hello,
I am looking for some advice on how to run findGSE and how to interpret the data when dealing with heterozygosity and plant genomes.
I summarized the runs on 4 species in this file:
FindGSE_tests_220124.pdf
The results show inaccuracy in genome size estimation as well as inconsistency in the resulting values when parameters change.

Briefly:

  • the estimations vary when having exp_hom=NN or not - all 4 cases
  • correctly, if using exp_hom=NN at the mode of the homo peak, no estimation results (except for Cgil - because the het peak is buried?
  • if using exp_hom=NN larger than the mode of the homo peak, the estimation does not change
  • at different exp_hom=NN values, some estimations vary by a lot, some by very little.

by species:

  • in Caus, findGSE seems to be working well, concordant with the HiFi assembly
  • Lmul varies by a lot, with the correct value resulting when using exp_hom=NN LOWER than the homo peak
  • Cgig is always below 1 Gb (expected: 1.4 Gb), with some runs failing
  • Cgil: a 4-fold size variation, though with 38 Gb raw HiFi data and a homo peak at 87, the genome could be at ~438 Mb.

The documentation says that the exp_hom=NN should be between the homo peak and its double 2*hom_peak>x>hom_peak ! and I see that there is consistency in the estimations in that. The only thing is, sometimes the values are correct (Caus), others they are off (Cgig, Lmul).
Can you please detail some guidelines on how to use the tool and get a reliable and consistent estimation in the case that the flow cytometry value is not known?
Thanks!

@dcopetti
Copy link
Author

dcopetti commented Mar 3, 2022

Hello, just checking on on this: do you have any comment on my results above?
THanks

@HeQSun
Copy link
Collaborator

HeQSun commented Dec 4, 2023

Hi @dcopetti ,

I just saw your message. I hope you had found the way to process your data.

If not yet, let me know.

Best,
Hequan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants