Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preprocess #4

Open
xuqiankun1104 opened this issue Jun 3, 2023 · 12 comments
Open

preprocess #4

xuqiankun1104 opened this issue Jun 3, 2023 · 12 comments

Comments

@xuqiankun1104
Copy link

hello,preprocess.py is the main script for preprocessing raw data.Can you explain the process of meta data in detail?thank you

@harshshredding
Copy link
Collaborator

Hi there, I am so sorry for the delay; I got busy doing some preparations for my marriage. I am writing a little tutorial that walks through the process of preparing training data Meta. I will post it on Monday night PDT.

Sorry for the delay again. Hope you have a good day. Let me know if you have any other quick questions in the meanwhile.

@harshshredding
Copy link
Collaborator

@xuqiankun1104, I have updated README, adding a section called Preprocessing data for Meta. Is this helpful?

@harshshredding
Copy link
Collaborator

harshshredding commented Jun 8, 2023

@xuqiankun1104 . From my perspective, Meta works as follows:

  1. In this framework, NER systems generate predictions( in ./training_results) on the validation set
  2. Meta uses the validation predictions to train
  3. Once trained, Meta filters the predictions made by the NER systems on the test set, generating a final set of predictions

@xuqiankun1104
Copy link
Author

Thank you very much for answering my question in your busy schedule. Your answer is very clear. The result of Chinese processing is a bit bad. What do you think is the reason?

@harshshredding
Copy link
Collaborator

harshshredding commented Jun 9, 2023

No worries, thank you for taking the time to check out this work. Is it possible for you to show me a bad preprocessing result so that I can give more specific feedback? Is this preprocessing for Meta? In my perspective, the main goal of preprocessing is to create a collection of the Sample type (utils.structs), so that the models can start using the data for training/evaluating.

@xuqiankun1104
Copy link
Author

@harshshredding How to run SpanPred ∪ SEQ and SpanPred x SEQ, can you introduce it in detail? Include SpanPred ∪ SEQ ∪ SeqCRF and SpanPred x SEQ x SeqCRF.

@xuqiankun1104
Copy link
Author

@harshshredding Can you elaborate on it? Thank you very much for how majority_ensemble and union_ensemble work.

@harshshredding
Copy link
Collaborator

@xuqiankun1104, Sorry for the late reply. I will try to illustrate with some examples below:

Below, I will try to describe the procedure for SpanPred x SEQ x SeqCRF on Genia:

  • First, we will train the three models SpanPred, SEQ, and SeqCRF on Genia.
  • The 3 models will produce predictions on the test set while training, and we will select the best predictions using the best epoch on the validation set.
  • Finally, we will combine the 3 prediction files (selected in the previous step) using the function models.majority_ensemble.get_majority_vote_predictions and then evaluate the result. I think I may have forgotten to provide the script to evaluate the result of get_majority_vote_predictions, so I am working on adding that right now.

Thank you so much for taking the time to use my work. I will get back to you with the evaluation script shortly.

@harshshredding
Copy link
Collaborator

harshshredding commented Jun 22, 2023

@xuqiankun1104 I just added a function called union_results to union_ensemble. Likewise, I added a function called get_majority_voting_results to majority_ensemble. Hopefully, these two functions will help you evaluate the ensembled predictions for majority and union. I am writing a notebook to illustrate these functions and I will get back to you shortly.

@xuqiankun1104
Copy link
Author

@harshshredding "union_resultsunion_ensembleget_majority_voting_resultsmajority_ensemble"Hello, can you give an example of how to run "major _ ensemble, union _ ensemble"? For example, "(selected in the previous step)" or "writing a notebook to illustrate the functions", thank you very much for your work, which is very enlightening to me. Could you leave me a mailbox? It is convenient for me to contact you directly, thank you."union_resultsunion_ensembleget_majority_voting_resultsmajority_ensemble"

@harshshredding
Copy link
Collaborator

harshshredding commented Jul 13, 2023

@xuqiankun1104 I just added 2 scripts in the root directory called evaluate_genia_majority.py and evaluate_genia_union.py. The first script combines the predictions by SEQ, SpanPred, and SeqCRF for GENIA using majority voting, and the second script combines using union. You can run the scripts with python evaluate_genia_majority.py and python evaluate_genia_union.py respectively.

My email is [email protected]. Feel free to ask more questions :D

@harshshredding
Copy link
Collaborator

So sorry for the late reply -- I got really busy with my marriage preparations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants