You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using biofilter to annotate build38 SNP positions from genes of interest and noticed something I wanted to bring to your attention. Of the 270 genes I was getting SNPs for 6 of them were invalid to biofilter. The gene names I input are shown below:
MKL2
EBI2
LOC729974
PSFL
FBXL10
C2ORF7
While these are acceptable gene names they are not the "official" gene name on NCBI gene. I went through the 6 using their gene IDs for NCBI:
MKL2 57496
EBI2 1880
LOC729974 729974
PSFL 83464
FBXL10 84678
C2ORF7 84279
The official names for these genes are:
MRTFB
GPR183
RFPL4AL1
APH1B
KDM2B
PRADC1
When biofilter is re-run using the "official name" the appropriate information for these 6 genes is returned. My assumption would be that you are using the 'Symbol' from the NCBI gene API to reference the information for these genes, which is why the input doesnt match. However, the NCBI gene API also has the 'Synonyms' field which should contain the "unofficial" names of the gene. Because this information already exists and is linked to the same gene ID I think it would be an important step to have any gene symbol that is invalid checked for the synonym and return the information for that gene.
The text was updated successfully, but these errors were encountered:
All of these genes are present in the LOKI database, except for LOC729974. I attempted to search for this gene in NCBI, and while it does return RFPL4AL1 as a result, I couldn't find any reference to LOC729974 being an alternate gene name. I’m currently investigating this point further using NCBI APIs.
As for the remaining five genes that did not return information, could it be possible that the input genes included their official names, which might have overridden their alternate names to avoid duplication in the output report.
To investigate this issue more accurately, I would need the full list of the 270 genes used as input and the exact arguments provided in the command. This will help me analyze the situation with greater precision.
Let me know if you can share this information so we can move forward.
From Chris Jones on 8/26/24:
I am using biofilter to annotate build38 SNP positions from genes of interest and noticed something I wanted to bring to your attention. Of the 270 genes I was getting SNPs for 6 of them were invalid to biofilter. The gene names I input are shown below:
MKL2
EBI2
LOC729974
PSFL
FBXL10
C2ORF7
While these are acceptable gene names they are not the "official" gene name on NCBI gene. I went through the 6 using their gene IDs for NCBI:
MKL2 57496
EBI2 1880
LOC729974 729974
PSFL 83464
FBXL10 84678
C2ORF7 84279
The official names for these genes are:
MRTFB
GPR183
RFPL4AL1
APH1B
KDM2B
PRADC1
When biofilter is re-run using the "official name" the appropriate information for these 6 genes is returned. My assumption would be that you are using the 'Symbol' from the NCBI gene API to reference the information for these genes, which is why the input doesnt match. However, the NCBI gene API also has the 'Synonyms' field which should contain the "unofficial" names of the gene. Because this information already exists and is linked to the same gene ID I think it would be an important step to have any gene symbol that is invalid checked for the synonym and return the information for that gene.
The text was updated successfully, but these errors were encountered: