Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add "Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases" #208

Merged
merged 4 commits into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 21 additions & 13 deletions freebase/GrailQA - Compositional Generalization.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the table for GrailQA-compositional generalisation, GPT-3.5-turbo (5-shot) is missing (60.5 EM and 66.3 F1), it's added now

Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,26 @@

| Model / System | Year | EM | F1 | Reported by |
| :-------------------------------: | :--: | :----: | :----: | :-------------------------------------------------------------: |
| DECAF (BM25 + FiD-3B) | 2022 | - | 81.8 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 79.0 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| TIARA | 2022 | 69.2 | 76.5 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC (Anonymous) | 2022 | - | 75.8 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| ArcaneQA | 2022 | 65.8 | 75.3 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| RnG-KBQA | 2022 | - | 71.2 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 63.792 | 71.156 | [Ye et. al.](https://arxiv.org/pdf/2109.08678.pdf) |
| ReTraCk (single model) | 2021 | 61.499 | 70.911 | [Chen et. al.](https://aclanthology.org/2021.acl-demo.39/) |
| Pangu (T5-Large) | 2023 | 75.2 | 82.2 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| DECAF (BM25 + FiD-3B) | 2022 | - | 81.8 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| Pangu (T5-3B) | 2023 | 74.6 | 81.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (BERT-base) | 2023 | 74.9 | 81.2 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA + GAIN (T5-3B) | 2023 | 73.7 | 80.0 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA + GAIN (T5-base) | 2023 | 73.0 | 79.6 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 79.0 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| TIARA | 2022 | 69.2 | 76.5 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC (Anonymous) | 2022 | - | 75.8 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| ArcaneQA | 2022 | 65.8 | 75.3 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| RnG-KBQA | 2022 | - | 71.2 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 63.792 | 71.156 | [Ye et al.](https://arxiv.org/pdf/2109.08678.pdf) |
| ReTraCk (single model) | 2021 | 61.499 | 70.911 | [Chen et al.](https://aclanthology.org/2021.acl-demo.39/) |
| GPT-3.5-turbo (5-shot) | 2023 | 60.5 | 66.3 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (Codex) | 2023 | 58.2 | 64.9 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| S2QL (single model) | 2021 | 54.716 | 64.679 | Anonymous |
| ArcaneQA (single model) | 2021 | 56.395 | 63.533 | Anonymous |
| BERT+Ranking (single model) | 2021 | 45.510 | 53.890 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| GloVe+Ranking (single model) | 2021 | 39.955 | 47.753 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| BERT+Transduction (single model) | 2021 | 31.040 | 35.985 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| QGG | 2022 | - | 33.0 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GloVe+Transduction (single model) | 2021 | 16.441 | 18.507 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| B-BINDEX (6)-R | 2023 | 51.8 | 58.3 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| BERT+Ranking (single model) | 2021 | 45.510 | 53.890 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| GloVe+Ranking (single model) | 2021 | 39.955 | 47.753 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| BERT+Transduction (single model) | 2021 | 31.040 | 35.985 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| QGG | 2022 | - | 33.0 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GloVe+Transduction (single model) | 2021 | 16.441 | 18.507 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
45 changes: 26 additions & 19 deletions freebase/GrailQA - Overall.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added result for GPT-3.5-turbo (5-shot)

Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,31 @@

| Model / System | Year | EM | F1 | Reported by |
| :-------------------------------: | :--: | :----: | :----: | :-------------------------------------------------------------: |
| DECAF (BM25 + FiD-3B) | 2022 | - | 78.7 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| TIARA | 2022 | 73.0 | 78.5 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC | 2022 | - | 77.6 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 76.0 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| UniParser | 2022 | - | 74.6 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 68.778 | 74.422 | [Ye et. al.](https://arxiv.org/pdf/2109.08678.pdf) |
| ArcaneQA | 2022 | 63.8 | 73.7 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| TIARA + GAIN (T5-3B) | 2023 | 76.3 | 81.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (T5-3B) | 2023 | 75.4 | 81.7 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA + GAIN (T5-base) | 2023 | 75.1 | 80.6 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (T5-large) | 2023 | 74.8 | 81.4 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (BERT-base) | 2023 | 73.7 | 79.9 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| DECAF (BM25 + FiD-3B) | 2022 | - | 78.7 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| TIARA | 2022 | 73.0 | 78.5 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC | 2022 | - | 77.6 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 76.0 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| UniParser | 2022 | - | 74.6 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 68.778 | 74.422 | [Ye et al.](https://arxiv.org/pdf/2109.08678.pdf) |
| ArcaneQA | 2022 | 63.8 | 73.7 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| S2QL (single model) | 2021 | 57.456 | 66.186 | Anonymous |
| ReTraCk (single model) | 2021 | 58.136 | 65.285 | [Chen et. al.](https://aclanthology.org/2021.acl-demo.39/) |
| BERT+Ranking | 2022 | 50.6 | 58.0 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| ReTraCk (single model) | 2021 | 58.136 | 65.285 | [Chen et al.](https://aclanthology.org/2021.acl-demo.39/) |
| Pangu (Codex) | 2023 | 56.4 | 65.0 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| B-BINDER (6)-R | 2023 | 53.2 | 58.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| BERT+Ranking | 2022 | 50.6 | 58.0 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| ArcaneQA (single model) | 2021 | 57.872 | 64.924 | Anonymous |
| BERT+Ranking (single model) | 2021 | 50.578 | 57.988 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| ChatGPT | 2023 | 46.77 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GloVe+Ranking (single model) | 2021 | 39.521 | 45.136 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| QGG | 2022 | - | 36.7 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GPT-3.5v3 | 2023 | 35.43 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| BERT+Transduction (single model) | 2021 | 33.255 | 36.803 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| GPT-3.5v2 | 2023 | 30.50 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| FLAN-T5 | 2023 | 29.02 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3 | 2023 | 27.58 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GloVe+Transduction (single model) | 2021 | 17.587 | 18.432 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| BERT+Ranking (single model) | 2021 | 50.578 | 57.988 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| ChatGPT | 2023 | 46.77 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GloVe+Ranking (single model) | 2021 | 39.521 | 45.136 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| QGG | 2022 | - | 36.7 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GPT-3.5v3 | 2023 | 35.43 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| BERT+Transduction (single model) | 2021 | 33.255 | 36.803 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| GPT-3.5v2 | 2023 | 30.50 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| FLAN-T5 | 2023 | 29.02 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3 | 2023 | 27.58 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GloVe+Transduction (single model) | 2021 | 17.587 | 18.432 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
31 changes: 18 additions & 13 deletions freebase/GrailQA - Zero-shot Generalization.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add resultt for GPT-3.5-turbo (5-shot)

Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,23 @@

| Model / System | Year | EM | F1 | Reported by |
| :-------------------------------: | :--: | :----: | :----: | :-------------------------------------------------------------: |
| TIARA | 2022 | 68.0 | 73.9 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC(Anonymous) | 2022 | - | 72.5 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| DECAF (BM25 + FiD-3B) | 2022 | - | 72.3 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| UniParser (Anonymous) | 2022 | - | 69.8 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 62.988 | 69.182 | [Ye et. al.](https://arxiv.org/pdf/2109.08678.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 68.0 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| ArcaneQA | 2022 | 52.9 | 66.0 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| TIARA + GAIN (T5-3B) | 2023 | 71.8 | 77.8 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (T5-3B) | 2023 | 71.6 | 78.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (T5-Large) | 2023 | 71.0 | 78.4 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA + GAIN (T5-base) | 2023 | 69.9 | 76.4 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (BERT-base) | 2023 | 69.1 | 76.1 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA | 2022 | 68.0 | 73.9 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC(Anonymous) | 2022 | - | 72.5 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| DECAF (BM25 + FiD-3B) | 2022 | - | 72.3 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| UniParser (Anonymous) | 2022 | - | 69.8 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 62.988 | 69.182 | [Ye et al.](https://arxiv.org/pdf/2109.08678.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 68.0 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| ArcaneQA | 2022 | 52.9 | 66.0 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| S2QL (single model) | 2021 | 55.122 | 63.598 | Anonymous |
| ArcaneQA (single model) | 2021 | 49.964 | 58.844 | Anonymous |
| BERT+Ranking (single model) | 2021 | 48.566 | 55.660 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| ReTraCk (single model) | 2021 | 44.561 | 52.539 | [Chen et. al.](https://aclanthology.org/2021.acl-demo.39/) |
| QGG | 2022 | - | 36.6 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GloVe+Ranking (single model) | 2021 | 28.886 | 33.792 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| BERT+Transduction (single model) | 2021 | 25.702 | 29.300 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| GloVe+Transduction (single model) | 2021 | 2.968 | 3.123 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| BERT+Ranking (single model) | 2021 | 48.566 | 55.660 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| ReTraCk (single model) | 2021 | 44.561 | 52.539 | [Chen et al.](https://aclanthology.org/2021.acl-demo.39/) |
| QGG | 2022 | - | 36.6 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GloVe+Ranking (single model) | 2021 | 28.886 | 33.792 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| BERT+Transduction (single model) | 2021 | 25.702 | 29.300 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| GloVe+Transduction (single model) | 2021 | 2.968 | 3.123 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
23 changes: 16 additions & 7 deletions freebase/GraphQuestions.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,19 @@
datasetUrl: https://github.com/ysu1989/GraphQuestions
---

| Model / System | Year | Accuracy | F1 | Reported by |
|:---------------------------------:|:----:|:------:|:------:|:----------------------------------------------------------:|
| ChatGPT | 2023 | 53.10 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3.5v3 | 2023 | 47.95 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3.5v2 | 2023 | 40.85 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3 | 2023 | 38.32 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| FLAN-T5 | 2023 | 32.27 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| Model / System | Year | Accuracy | F1 | Reported by |
|:---------------------------------:|:----:|:--------:|:------:|:---------------------------------------------------------:|
| ChatGPT | 2023 | 53.10 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3.5v3 | 2023 | 47.95 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| TIARA + GAIN (T5-3B) | 2023 | - | 48.7 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA + GAIN (T5-base) | 2023 | - | 45.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| GPT-3.5v2 | 2023 | 40.85 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| KB-BINDER | 2023 | - | 39.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| GPT-3 | 2023 | 38.32 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| TIARA (T5-base) | 2023 | - | 37.9 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| FLAN-T5 | 2023 | 32.27 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| ArcaneQA | 2023 | - | 31.8 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| BERT + Ranking | 2023 | - | 25.0 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| SPARQA | 2023 | - | 21.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| PARA4QA | 2023 | - | 20.4 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| UDepLambda | 2023 | - | 17.7 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
Loading
Loading