-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue at Performing statistical tests stage #3
Comments
I was thinking about this issue, would it be worthwhile to use the entire genome as host coverage predictor rather than the scaffolds when you have contiguous genomes? I know coverage can vary over the genome/scaffolds but would help with my previous problem. |
Hi, I'm not sure how I missed this post. I just now saw it and I'm sorry about that! Yes, sounds like the host had 0 data points so PropagAtE was trying to compare the prophage to nothing. I should add an exception to avoid the error. For your second post you make a good argument, but PropagAtE was created for metagenomic data in which the entire host genome cannot be accurately identified. Even with binning a MAG there will be multiple scaffolds (likely) that are contamination and may alter the coverage results. It is more accurate to consider only the parent scaffold of the prophage. I will consider adding an option to specify an entire MAG though I wouldn't count on that being implemented. |
Thanks for the reply! Since making the post I've come around to the idea that relying using prophages without flanking-host regions is bad or at least risky. This paper highlights the issue of 'miss-binning' prophages when bacterial MAGs are closely related: https://doi.org/10.1038/s41587-020-0718-6 |
I agree. Great group to rely on who wrote that paper. I have a manuscript coming out in ~1 week regarding viral binning, including a couple points on binning prophages. Self promotion :) but also it may be of interest when available. |
Honestly, really looking forward to reading it! I am alway on the look out for ideas for improving binning prophages in MAGs.
I will be submitting a paper soon where I went into a lot of effort to retrieve prophages from bins, and most without host-flanking regions. Hopefully I can convince you and Simon Roux that my assumptions are correct. It was quite a bit of work, and even then there is always the haunting risk of false-positives. I did my best to confirm they were true active prophages with viral sequencing but I found only 72% of active prophages were found in viral samples (what I consider likely true active prophages). Even then I think it was due to the benefit of having longitudinal sequencing. Moving forward I think your approach is better!
Thanks again for putting out PropagAtE!
Steven Sutcliffe
Maurice Lab
McGill
… On Dec 2, 2021, at 4:02 PM, Kris Kieft ***@***.***> wrote:
I agree. Great group to rely on who wrote that paper. I have a manuscript coming out in ~1 week regarding viral binning, including a couple points on binning prophages. Self promotion :) but also it may be of interest when available.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALNOJ27DU5XIKALUNLM2OMLUO7NF7ANCNFSM4236TVPA>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
At the statistical stage I am getting the error:
"statistics.StatisticsError: mean requires at least one data point"
It comes from line 758
avg_h = statistics.mean(cov_h)
I think it's because I am working with a dataset you probably hadn't expected. I've concatenated multiple prophages, and this leads to prophages that span the entire scaffold. So I guess that would have host-coverage of 0?
Is this the issue?
The text was updated successfully, but these errors were encountered: