Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biofilter Group Annotation #15

Open
rvenkatesh99 opened this issue Nov 15, 2024 · 2 comments
Open

Biofilter Group Annotation #15

rvenkatesh99 opened this issue Nov 15, 2024 · 2 comments
Labels
2.4.4 bug Something isn't working

Comments

@rvenkatesh99
Copy link

This issue was found on Biofilter 2.4.2

I ran pathway annotation on a list of 55889 genes; biofilter ignored 22674 unrecognized identifiers and 114 ambiguous and returned outputs for 18482 unique genes. I then re-ran the list of the 37407 genes that were previously removed, and got additional pathway annotations for 18 genes. So the input did change between iterations but it was a subset of the original input.

Input Files:
~group/personal/rasika/Biofilter_ROSMAP/RNAseq/ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt
~group/personal/rasika/Biofilter_ROSMAP/RNAseq/ROSMAP_RNAseq_removedbiofilt

The biofilter commands I ran for 2.4.2:
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_ENSEMBL_gene_pathways--overwrite

biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_removedbiofilt.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_removedbiofilt --overwrite

Expected behavior
I would’ve expected those 18 genes to be annotated in the original run as they were included in my original gene set. Additionally, we would've expected similar behavior in regards to those 18 genes whether the --annotate or --filter flag was used.

These are the 18 genes in both lists that should have been annotated the first time around:
image

I did a quick retest on biofilter 2.4.3 and did not replicate the issue:
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt --gene-identifier-type ensembl_gid --annotate gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_ENSEMBL_gene_pathways_2.4.3 --overwrite

biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_removedbiofilt_2.4.3.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_removedbiofilt_2.4.3 --overwrite

Biofilter 2.4.3 may have resolved the issue based on some recent reruns, but files can still be used for integrated testing

@AndreRico
Copy link
Collaborator

AndreRico commented Nov 15, 2024

Hi @rvenkatesh99! Thanks to share this case. I started a new Issue test in: https://github.com/RitchieLab/biofilter/tree/development/tests/issues/b15_biofilter_group_annotation

@AndreRico AndreRico added bug Something isn't working 2.4.4 labels Nov 15, 2024
@AndreRico
Copy link
Collaborator

AndreRico commented Dec 2, 2024

Hi @rvenkatesh99!
Many of the genes not returned in the first execution are due to the build version mismatch, as they belong to GRCh37, while our current LOKI database is built for GRCh38.

The genes captured in the second execution, however, are due to duplicate ENSEMBL codes pointing to the same biopolymer_id. Since the --allow-duplicate-output argument was not activated, Biofilter only considers the first ENSEMBL gene associated with a given biopolymer_id. If another ENSEMBL gene pointing to the same biopolymer_id is encountered, it is ignored.

In the second execution, after removing the genes returned in the first execution, Biofilter interprets the next set of genes as being called for the first time. As a result, they are processed instead of being discarded.

I conclude that there is no issue with the system; however, I have identified several points (detailed in the Issue Test) that we need to discuss as a team and can bring them to the next system meeting.

Please let me know if these responses address the concerns raised in this issue so we can close it, or if there is anything I may have overlooked.

Thanks,
Andre

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.4.4 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants