you can get training data from this website.
this project will use these datasets below:
- empathetic_dialogues
- eSNL
- gigaword
- eli5
- wiki_auto
Place the data in the corresponding folders according to the diagram structure.
Run the Python script for preprocessing as follows:
cd dataset/rawdata/scripts/
python prepare_trainingdata.py
Please cd to the root directory of the project and then run the script. * all bash scripts need run in root directory of the project
bash scripts/train-on-5-instruction-task.sh
Generate answer with base model and the tuned model
bash scripts/generate-pred-result.sh
Evaluate scores
bash scripts/evaluate-on-instruction-tasks.sh
Use lm-evaluation-harness to evaluate general tasks scores
To install the lm-eval
package from the github repository, run:
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
There are 3 sub-tasks in general tasks:
- Domain Knowledge
- Reasoning
- Reading Comperhension
Launch the eval scripts and change the sub-task that you need in this script
bash scripts/evaluate-on-general-tasks.sh
plot results use python notebook plot_results.ipynb
in the main path of this respo.