Skip to content

Commit

Permalink
Merge pull request #208 from DavidBlavid/issue_186
Browse files Browse the repository at this point in the history
add "Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases"
  • Loading branch information
xixi019 authored Feb 5, 2024
2 parents 1d32a34 + 94a86b0 commit 7524611
Show file tree
Hide file tree
Showing 5 changed files with 85 additions and 53 deletions.
34 changes: 21 additions & 13 deletions freebase/GrailQA - Compositional Generalization.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,26 @@

| Model / System | Year | EM | F1 | Reported by |
| :-------------------------------: | :--: | :----: | :----: | :-------------------------------------------------------------: |
| DECAF (BM25 + FiD-3B) | 2022 | - | 81.8 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 79.0 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| TIARA | 2022 | 69.2 | 76.5 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC (Anonymous) | 2022 | - | 75.8 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| ArcaneQA | 2022 | 65.8 | 75.3 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| RnG-KBQA | 2022 | - | 71.2 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 63.792 | 71.156 | [Ye et. al.](https://arxiv.org/pdf/2109.08678.pdf) |
| ReTraCk (single model) | 2021 | 61.499 | 70.911 | [Chen et. al.](https://aclanthology.org/2021.acl-demo.39/) |
| Pangu (T5-Large) | 2023 | 75.2 | 82.2 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| DECAF (BM25 + FiD-3B) | 2022 | - | 81.8 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| Pangu (T5-3B) | 2023 | 74.6 | 81.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (BERT-base) | 2023 | 74.9 | 81.2 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA + GAIN (T5-3B) | 2023 | 73.7 | 80.0 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA + GAIN (T5-base) | 2023 | 73.0 | 79.6 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 79.0 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| TIARA | 2022 | 69.2 | 76.5 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC (Anonymous) | 2022 | - | 75.8 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| ArcaneQA | 2022 | 65.8 | 75.3 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| RnG-KBQA | 2022 | - | 71.2 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 63.792 | 71.156 | [Ye et al.](https://arxiv.org/pdf/2109.08678.pdf) |
| ReTraCk (single model) | 2021 | 61.499 | 70.911 | [Chen et al.](https://aclanthology.org/2021.acl-demo.39/) |
| GPT-3.5-turbo (5-shot) | 2023 | 60.5 | 66.3 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (Codex) | 2023 | 58.2 | 64.9 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| S2QL (single model) | 2021 | 54.716 | 64.679 | Anonymous |
| ArcaneQA (single model) | 2021 | 56.395 | 63.533 | Anonymous |
| BERT+Ranking (single model) | 2021 | 45.510 | 53.890 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| GloVe+Ranking (single model) | 2021 | 39.955 | 47.753 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| BERT+Transduction (single model) | 2021 | 31.040 | 35.985 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| QGG | 2022 | - | 33.0 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GloVe+Transduction (single model) | 2021 | 16.441 | 18.507 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| B-BINDEX (6)-R | 2023 | 51.8 | 58.3 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| BERT+Ranking (single model) | 2021 | 45.510 | 53.890 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| GloVe+Ranking (single model) | 2021 | 39.955 | 47.753 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| BERT+Transduction (single model) | 2021 | 31.040 | 35.985 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| QGG | 2022 | - | 33.0 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GloVe+Transduction (single model) | 2021 | 16.441 | 18.507 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
46 changes: 27 additions & 19 deletions freebase/GrailQA - Overall.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,32 @@

| Model / System | Year | EM | F1 | Reported by |
| :-------------------------------: | :--: | :----: | :----: | :-------------------------------------------------------------: |
| DECAF (BM25 + FiD-3B) | 2022 | - | 78.7 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| TIARA | 2022 | 73.0 | 78.5 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC | 2022 | - | 77.6 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 76.0 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| UniParser | 2022 | - | 74.6 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 68.778 | 74.422 | [Ye et. al.](https://arxiv.org/pdf/2109.08678.pdf) |
| ArcaneQA | 2022 | 63.8 | 73.7 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| TIARA + GAIN (T5-3B) | 2023 | 76.3 | 81.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (T5-3B) | 2023 | 75.4 | 81.7 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA + GAIN (T5-base) | 2023 | 75.1 | 80.6 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (T5-large) | 2023 | 74.8 | 81.4 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (BERT-base) | 2023 | 73.7 | 79.9 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| DECAF (BM25 + FiD-3B) | 2022 | - | 78.7 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| TIARA | 2022 | 73.0 | 78.5 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC | 2022 | - | 77.6 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 76.0 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| UniParser | 2022 | - | 74.6 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 68.778 | 74.422 | [Ye et al.](https://arxiv.org/pdf/2109.08678.pdf) |
| GPT-3.5-turbo (5-shot) | 2023 | 66.6 | 71.4 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| ArcaneQA | 2022 | 63.8 | 73.7 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| S2QL (single model) | 2021 | 57.456 | 66.186 | Anonymous |
| ReTraCk (single model) | 2021 | 58.136 | 65.285 | [Chen et. al.](https://aclanthology.org/2021.acl-demo.39/) |
| BERT+Ranking | 2022 | 50.6 | 58.0 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| ReTraCk (single model) | 2021 | 58.136 | 65.285 | [Chen et al.](https://aclanthology.org/2021.acl-demo.39/) |
| Pangu (Codex) | 2023 | 56.4 | 65.0 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| B-BINDER (6)-R | 2023 | 53.2 | 58.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| BERT+Ranking | 2022 | 50.6 | 58.0 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| ArcaneQA (single model) | 2021 | 57.872 | 64.924 | Anonymous |
| BERT+Ranking (single model) | 2021 | 50.578 | 57.988 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| ChatGPT | 2023 | 46.77 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GloVe+Ranking (single model) | 2021 | 39.521 | 45.136 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| QGG | 2022 | - | 36.7 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GPT-3.5v3 | 2023 | 35.43 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| BERT+Transduction (single model) | 2021 | 33.255 | 36.803 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| GPT-3.5v2 | 2023 | 30.50 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| FLAN-T5 | 2023 | 29.02 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3 | 2023 | 27.58 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GloVe+Transduction (single model) | 2021 | 17.587 | 18.432 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| BERT+Ranking (single model) | 2021 | 50.578 | 57.988 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| ChatGPT | 2023 | 46.77 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GloVe+Ranking (single model) | 2021 | 39.521 | 45.136 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| QGG | 2022 | - | 36.7 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GPT-3.5v3 | 2023 | 35.43 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| BERT+Transduction (single model) | 2021 | 33.255 | 36.803 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| GPT-3.5v2 | 2023 | 30.50 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| FLAN-T5 | 2023 | 29.02 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3 | 2023 | 27.58 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GloVe+Transduction (single model) | 2021 | 17.587 | 18.432 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
32 changes: 19 additions & 13 deletions freebase/GrailQA - Zero-shot Generalization.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,24 @@

| Model / System | Year | EM | F1 | Reported by |
| :-------------------------------: | :--: | :----: | :----: | :-------------------------------------------------------------: |
| TIARA | 2022 | 68.0 | 73.9 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC(Anonymous) | 2022 | - | 72.5 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| DECAF (BM25 + FiD-3B) | 2022 | - | 72.3 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| UniParser (Anonymous) | 2022 | - | 69.8 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 62.988 | 69.182 | [Ye et. al.](https://arxiv.org/pdf/2109.08678.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 68.0 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| ArcaneQA | 2022 | 52.9 | 66.0 | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| TIARA + GAIN (T5-3B) | 2023 | 71.8 | 77.8 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (T5-3B) | 2023 | 71.6 | 78.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (T5-Large) | 2023 | 71.0 | 78.4 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA + GAIN (T5-base) | 2023 | 69.9 | 76.4 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| Pangu (BERT-base) | 2023 | 69.1 | 76.1 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA | 2022 | 68.0 | 73.9 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| DeCC(Anonymous) | 2022 | - | 72.5 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| DECAF (BM25 + FiD-3B) | 2022 | - | 72.3 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| UniParser (Anonymous) | 2022 | - | 69.8 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| RnG-KBQA (single model) | 2021 | 62.988 | 69.182 | [Ye et al.](https://arxiv.org/pdf/2109.08678.pdf) |
| GPT-3.5-turbo (5-shot) | 2023 | 61.9 | 67.2 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| DECAF (BM25 + FiD-large) | 2022 | - | 68.0 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| ArcaneQA | 2022 | 52.9 | 66.0 | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
| S2QL (single model) | 2021 | 55.122 | 63.598 | Anonymous |
| ArcaneQA (single model) | 2021 | 49.964 | 58.844 | Anonymous |
| BERT+Ranking (single model) | 2021 | 48.566 | 55.660 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| ReTraCk (single model) | 2021 | 44.561 | 52.539 | [Chen et. al.](https://aclanthology.org/2021.acl-demo.39/) |
| QGG | 2022 | - | 36.6 | [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GloVe+Ranking (single model) | 2021 | 28.886 | 33.792 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| BERT+Transduction (single model) | 2021 | 25.702 | 29.300 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| GloVe+Transduction (single model) | 2021 | 2.968 | 3.123 | [Gu et. al.](https://arxiv.org/abs/2011.07743) |
| BERT+Ranking (single model) | 2021 | 48.566 | 55.660 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| ReTraCk (single model) | 2021 | 44.561 | 52.539 | [Chen et al.](https://aclanthology.org/2021.acl-demo.39/) |
| QGG | 2022 | - | 36.6 | [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf) |
| GloVe+Ranking (single model) | 2021 | 28.886 | 33.792 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| BERT+Transduction (single model) | 2021 | 25.702 | 29.300 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
| GloVe+Transduction (single model) | 2021 | 2.968 | 3.123 | [Gu et al.](https://arxiv.org/abs/2011.07743) |
23 changes: 16 additions & 7 deletions freebase/GraphQuestions.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,19 @@
datasetUrl: https://github.com/ysu1989/GraphQuestions
---

| Model / System | Year | Accuracy | F1 | Reported by |
|:---------------------------------:|:----:|:------:|:------:|:----------------------------------------------------------:|
| ChatGPT | 2023 | 53.10 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3.5v3 | 2023 | 47.95 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3.5v2 | 2023 | 40.85 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3 | 2023 | 38.32 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| FLAN-T5 | 2023 | 32.27 | - | [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf) |
| Model / System | Year | Accuracy | F1 | Reported by |
|:---------------------------------:|:----:|:--------:|:------:|:---------------------------------------------------------:|
| ChatGPT | 2023 | 53.10 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| GPT-3.5v3 | 2023 | 47.95 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| TIARA + GAIN (T5-3B) | 2023 | - | 48.7 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| TIARA + GAIN (T5-base) | 2023 | - | 45.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| GPT-3.5v2 | 2023 | 40.85 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| KB-BINDER | 2023 | - | 39.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| GPT-3 | 2023 | 38.32 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| TIARA (T5-base) | 2023 | - | 37.9 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| FLAN-T5 | 2023 | 32.27 | - | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) |
| ArcaneQA | 2023 | - | 31.8 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| BERT + Ranking | 2023 | - | 25.0 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| SPARQA | 2023 | - | 21.5 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| PARA4QA | 2023 | - | 20.4 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
| UDepLambda | 2023 | - | 17.7 | [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf) |
Loading

0 comments on commit 7524611

Please sign in to comment.