-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request support on data input sample and output sample just for prediction #9
Comments
Hi @Darrshan-Sankar, The results of AIONER cannot be fed directly to BioREx. BioREx requires that the entities' IDs be normalized. You have to use our normalization components, such as GNORM2. If you just want to process the PubMed abstracts, you can find the https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator3/, where we provide the PubMed precessed relation results. Please let me know if you need further help. |
@ptlai Thanks for your support. I actually have to process full texts. So could you please guide how to normalise AIONER results to input for BioREx. Possibly a script would help better |
Hi @Darrshan-Sankar, The simplest way is to use the NE/ID annotations in https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator3/ as well (BioCXML files). We processed the NEs/IDs for full-text already, but relations for abstracts only. You can treat each paragraph as an abstract and then feed it to BioREx. If you still need help using normalization components, you may contact Dr. Wei ([email protected]), who deals with the entire backend process of our PubTator. |
@ptlai Yeah went through the FTP. As you said, only got relations for abstract. Thank you for providing contact of Dr.Wei to contact him |
|
Hi @zy2376 , A normalized example of an AIONER file can be found at bc8_biored_task1_val.txt](https://github.com/user-attachments/files/17559816/bc8_biored_task1_val.txt). Please note that AIONER NE types must be converted to their corresponding BioRED NE types (e.g., 'Gene' to 'GeneOrGeneProduct') before running BioREx. |
@ptlai Thank you very much for providing the normalized example, I was finally able to successfully run he AIONER-to-BioRex process for PubMed abstract. However, the process failed when applied to PMC full-text. Could you please provide guidance on resolving this issue? |
Hi @zy2376 , To process the full-text data with BioREx, you can treat each paragraph as a separate abstract. For instance, take the article available at https://www.ncbi.nlm.nih.gov/research/pubtator3/publication/33202951. You can format the content like this:
|
@ptlai Thanks to your comments, I've converted my full-text into |t| and |a| title format, and it works for some paragraphs(see the attached BioRex input file "PMC7611502_t-a-format_239rows.txt" and BioRex output file "PMC7611502_t-a-format_239rows_predict.txt". PMC7611502_t-a-format.txt |
Hi @zy2376 , Thank you for providing the example PubTator files. Upon review, I noticed a few formatting issues that need to be addressed:
Incorrect
Correct
Incorrect
Correct
|
@ptlai Thanks to your help, the full text can now be extracted using BioRex. However, another issue has arisen: each document provides the same relations🤦. I've included the input and output files below. Please help me check them. |
Apologies for the confusion. I noticed that the document ID serves as a unique index for the input. Therefore, you need to use a different index for each input text, as shown below:
|
I used AIONER output to extract relations, but it didn't work. Went through the issues and found the example to be in BioRED repo. Want to know how to create such data and a sample output, about how the predict.pubtator will look like
The text was updated successfully, but these errors were encountered: