resources/ # DB files go here
src/
neuraldb/ # Python package containing NeuralDB project
dataset/ # Components pertaining to reading and loading the DB files
instance_generator/ # Generates model inputs from different DB formats
evaluation/ # Helper functions that allow in-line evaluation of the models from the training script
modelling/ # Extra models/trainers etc
retriever/ # TF-IDF and DPR baselines
util/ # Other utils
tests/ # Unit tests for scorer
Baseline retrieval methods can be run first, collecting the data to be used by the downstream models
bash scripts/baselines/retrieve.sh dpr
bash scripts/baselines/retrieve.sh tfidf
These will train and generate predictions for the v2.4 databases containing up to 25 facts.
The scripts use task spooler to manager a queue of jobs. If you do not have this, remove tsp
from the scripts.
export SEED=1
bash scripts/experiments_ours.sh v2.4_25
bash scripts/experiments_baseline.sh v2.4_25
The final scoring script would take the predictions generated by these scripts and evaluate them against the reference predictions.
python -m neuraldb.final_scoring
Graphs which plot the answer accuracy by DB size are generated from
python -m neuraldb.final_scoring_with_db_size
There are a couple of variants of this scoring script to evaluate for larger databases (v2.4_50, v2.4_100, v2.4_250, v2.4_500 and v2.4_1000): This would involve running the models trained on 25 facts with larger databases.
bash scripts/ours/predict_spj_rand_sweep.sh
python -m neuraldb.final_scoring_with_db_size_sweep
Was performed using a modified version of the FiD code adapted from https://github.com/facebookresearch/FiD, the outputs of this can be converted to the NeuralDB format with
python -m neuraldb.convert_legacy_predictions <fid_output> <output_file>