Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing PropagAtE output #11

Open
asierFernandezP opened this issue Jun 5, 2023 · 1 comment
Open

Processing PropagAtE output #11

asierFernandezP opened this issue Jun 5, 2023 · 1 comment

Comments

@asierFernandezP
Copy link

asierFernandezP commented Jun 5, 2023

Hi,

I am currently applying PropagAtE (default parameters) to gut metagenomic data and I am struggling to understand which filters to apply to the output .tsv files. I see that in the output .tsv files PropagAtE predicts a value (dormant/active) even for prophages with very low breadth of coverage. Should I use certain cut-off based on 'prophage_cov_breadth' column?

In the supplementary Table S3B of the propagAtE paper, I see that the values for this column that you report are really high for CRC or HeQ datasets, but the this goes down for other datasets. Still, you considered them as present in your analyses. Am I misunderstanding the meaning of this column? Would you recommend any kind of post-filtering of prophages after running PropagAtE with default parameters (considering that a high number of potential prophage sequences are run against the sequencing reads of each metagenomic sample, so I expect only a few of them to be present in each)?

Thank you!

@KrisKieft
Copy link
Member

There is no direct answer for this. The reason you're seeing varying breadth of coverages is mainly due to the size of the input dataset. Datasets with more reads with naturally get higher breadth of coverage on average per sequence. The cutoff you choose may have to be modified depending on the size of your input dataset as well as what potential range of error you're willing to have. The higher you set the cutoff the more confidence there is in a prediction. The default is set rather low to be inclusive since many phages end up being in low abundance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants