You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran pathway annotation on a list of 55889 genes; biofilter ignored 22674 unrecognized identifiers and 114 ambiguous and returned outputs for 18482 unique genes. I then re-ran the list of the 37407 genes that were previously removed, and got additional pathway annotations for 18 genes. So the input did change between iterations but it was a subset of the original input.
The biofilter commands I ran for 2.4.2: biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_ENSEMBL_gene_pathways--overwrite
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_removedbiofilt.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_removedbiofilt --overwrite
Expected behavior
I would’ve expected those 18 genes to be annotated in the original run as they were included in my original gene set. Additionally, we would've expected similar behavior in regards to those 18 genes whether the --annotate or --filter flag was used.
These are the 18 genes in both lists that should have been annotated the first time around:
I did a quick retest on biofilter 2.4.3 and did not replicate the issue: biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt --gene-identifier-type ensembl_gid --annotate gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_ENSEMBL_gene_pathways_2.4.3 --overwrite
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_removedbiofilt_2.4.3.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_removedbiofilt_2.4.3 --overwrite
Biofilter 2.4.3 may have resolved the issue based on some recent reruns, but files can still be used for integrated testing
The text was updated successfully, but these errors were encountered:
Hi @rvenkatesh99!
Many of the genes not returned in the first execution are due to the build version mismatch, as they belong to GRCh37, while our current LOKI database is built for GRCh38.
The genes captured in the second execution, however, are due to duplicate ENSEMBL codes pointing to the same biopolymer_id. Since the --allow-duplicate-output argument was not activated, Biofilter only considers the first ENSEMBL gene associated with a given biopolymer_id. If another ENSEMBL gene pointing to the same biopolymer_id is encountered, it is ignored.
In the second execution, after removing the genes returned in the first execution, Biofilter interprets the next set of genes as being called for the first time. As a result, they are processed instead of being discarded.
I conclude that there is no issue with the system; however, I have identified several points (detailed in the Issue Test) that we need to discuss as a team and can bring them to the next system meeting.
Please let me know if these responses address the concerns raised in this issue so we can close it, or if there is anything I may have overlooked.
This issue was found on Biofilter 2.4.2
I ran pathway annotation on a list of 55889 genes; biofilter ignored 22674 unrecognized identifiers and 114 ambiguous and returned outputs for 18482 unique genes. I then re-ran the list of the 37407 genes that were previously removed, and got additional pathway annotations for 18 genes. So the input did change between iterations but it was a subset of the original input.
Input Files:
~group/personal/rasika/Biofilter_ROSMAP/RNAseq/ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt
~group/personal/rasika/Biofilter_ROSMAP/RNAseq/ROSMAP_RNAseq_removedbiofilt
The biofilter commands I ran for 2.4.2:
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_ENSEMBL_gene_pathways--overwrite
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_removedbiofilt.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_removedbiofilt --overwrite
Expected behavior
I would’ve expected those 18 genes to be annotated in the original run as they were included in my original gene set. Additionally, we would've expected similar behavior in regards to those 18 genes whether the
--annotate
or--filter
flag was used.These are the 18 genes in both lists that should have been annotated the first time around:
I did a quick retest on biofilter 2.4.3 and did not replicate the issue:
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt --gene-identifier-type ensembl_gid --annotate gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_ENSEMBL_gene_pathways_2.4.3 --overwrite
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_removedbiofilt_2.4.3.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_removedbiofilt_2.4.3 --overwrite
Biofilter 2.4.3 may have resolved the issue based on some recent reruns, but files can still be used for integrated testing
The text was updated successfully, but these errors were encountered: