Merge pull request #208 from DavidBlavid/issue_186

add "Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases"
KGQA · Feb 5, 2024 · 7524611 · 7524611
2 parents 1d32a34 + 94a86b0
commit 7524611
Show file tree

Hide file tree

Showing 5 changed files with 85 additions and 53 deletions.
diff --git a/freebase/GrailQA - Compositional Generalization.md b/freebase/GrailQA - Compositional Generalization.md
@@ -5,18 +5,26 @@
 
 |          Model / System           | Year |   EM   |   F1   |                           Reported by                           |
 | :-------------------------------: | :--: | :----: | :----: | :-------------------------------------------------------------: |
-|       DECAF (BM25 + FiD-3B)       | 2022 |   -    |  81.8  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|     DECAF (BM25 + FiD-large)      | 2022 |   -    |  79.0  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|               TIARA               | 2022 |  69.2  |  76.5  | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
-|         DeCC (Anonymous)          | 2022 |   -    |  75.8  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|             ArcaneQA              | 2022 |  65.8  |  75.3  | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
-|             RnG-KBQA              | 2022 |   -    |  71.2  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|      RnG-KBQA (single model)      | 2021 | 63.792 | 71.156 |       [Ye et. al.](https://arxiv.org/pdf/2109.08678.pdf)        |
-|      ReTraCk (single model)       | 2021 | 61.499 | 70.911 |   [Chen et. al.](https://aclanthology.org/2021.acl-demo.39/)    |
+|         Pangu (T5-Large)          | 2023 |  75.2  |  82.2  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|       DECAF (BM25 + FiD-3B)       | 2022 |   -    |  81.8  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)         |
+|          Pangu (T5-3B)            | 2023 |  74.6  |  81.5  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|         Pangu (BERT-base)         | 2023 |  74.9  |  81.2  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|        TIARA + GAIN (T5-3B)       | 2023 |  73.7  |  80.0  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|       TIARA + GAIN (T5-base)      | 2023 |  73.0  |  79.6  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|     DECAF (BM25 + FiD-large)      | 2022 |   -    |  79.0  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)         |
+|               TIARA               | 2022 |  69.2  |  76.5  | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf)  |
+|         DeCC (Anonymous)          | 2022 |   -    |  75.8  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)         |
+|             ArcaneQA              | 2022 |  65.8  |  75.3  | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf)  |
+|             RnG-KBQA              | 2022 |   -    |  71.2  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)         |
+|      RnG-KBQA (single model)      | 2021 | 63.792 | 71.156 |       [Ye et al.](https://arxiv.org/pdf/2109.08678.pdf)         |
+|      ReTraCk (single model)       | 2021 | 61.499 | 70.911 |   [Chen et al.](https://aclanthology.org/2021.acl-demo.39/)     |
+|        GPT-3.5-turbo (5-shot)     | 2023 |  60.5  |  66.3  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|           Pangu (Codex)           | 2023 |  58.2  |  64.9  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
 |        S2QL (single model)        | 2021 | 54.716 | 64.679 |                            Anonymous                            |
 |      ArcaneQA (single model)      | 2021 | 56.395 | 63.533 |                            Anonymous                            |
-|    BERT+Ranking (single model)    | 2021 | 45.510 | 53.890 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
-|   GloVe+Ranking (single model)    | 2021 | 39.955 | 47.753 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
-| BERT+Transduction (single model)  | 2021 | 31.040 | 35.985 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
-|                QGG                | 2022 |   -    |  33.0  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-| GloVe+Transduction (single model) | 2021 | 16.441 | 18.507 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
+|           B-BINDEX (6)-R          | 2023 |  51.8  |  58.3  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|    BERT+Ranking (single model)    | 2021 | 45.510 | 53.890 |         [Gu et al.](https://arxiv.org/abs/2011.07743)           |
+|   GloVe+Ranking (single model)    | 2021 | 39.955 | 47.753 |         [Gu et al.](https://arxiv.org/abs/2011.07743)           |
+| BERT+Transduction (single model)  | 2021 | 31.040 | 35.985 |         [Gu et al.](https://arxiv.org/abs/2011.07743)           |
+|                QGG                | 2022 |   -    |  33.0  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)         |
+| GloVe+Transduction (single model) | 2021 | 16.441 | 18.507 |         [Gu et al.](https://arxiv.org/abs/2011.07743)           |
diff --git a/freebase/GrailQA - Overall.md b/freebase/GrailQA - Overall.md
@@ -5,24 +5,32 @@
 
 |          Model / System           | Year |   EM   |   F1   |                           Reported by                           |
 | :-------------------------------: | :--: | :----: | :----: | :-------------------------------------------------------------: |
-|       DECAF (BM25 + FiD-3B)       | 2022 |   -    |  78.7  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|               TIARA               | 2022 |  73.0  |  78.5  | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
-|               DeCC                | 2022 |   -    |  77.6  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|     DECAF (BM25 + FiD-large)      | 2022 |   -    |  76.0  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|             UniParser             | 2022 |   -    |  74.6  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|      RnG-KBQA (single model)      | 2021 | 68.778 | 74.422 |       [Ye et. al.](https://arxiv.org/pdf/2109.08678.pdf)        |
-|             ArcaneQA              | 2022 |  63.8  |  73.7  | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
+|        TIARA + GAIN (T5-3B)       | 2023 |  76.3  |  81.5  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)       |
+|            Pangu (T5-3B)          | 2023 |  75.4  |  81.7  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)       |
+|       TIARA + GAIN (T5-base)      | 2023 |  75.1  |  80.6  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)       |
+|          Pangu (T5-large)         | 2023 |  74.8  |  81.4  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)       |
+|          Pangu (BERT-base)        | 2023 |  73.7  |  79.9  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)       |
+|       DECAF (BM25 + FiD-3B)       | 2022 |   -    |  78.7  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)        |
+|               TIARA               | 2022 |  73.0  |  78.5  | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
+|               DeCC                | 2022 |   -    |  77.6  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)        |
+|     DECAF (BM25 + FiD-large)      | 2022 |   -    |  76.0  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)        |
+|             UniParser             | 2022 |   -    |  74.6  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)        |
+|      RnG-KBQA (single model)      | 2021 | 68.778 | 74.422 |       [Ye et al.](https://arxiv.org/pdf/2109.08678.pdf)        |
+|        GPT-3.5-turbo (5-shot)     | 2023 |  66.6  |  71.4  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)       |
+|             ArcaneQA              | 2022 |  63.8  |  73.7  | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
 |        S2QL (single model)        | 2021 | 57.456 | 66.186 |                            Anonymous                            |
-|      ReTraCk (single model)       | 2021 | 58.136 | 65.285 |   [Chen et. al.](https://aclanthology.org/2021.acl-demo.39/)    |
-|           BERT+Ranking            | 2022 |  50.6  |  58.0  | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
+|      ReTraCk (single model)       | 2021 | 58.136 | 65.285 |   [Chen et al.](https://aclanthology.org/2021.acl-demo.39/)    |
+|            Pangu (Codex)          | 2023 |  56.4  |  65.0  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)       |
+|           B-BINDER (6)-R          | 2023 |  53.2  |  58.5  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)       |
+|           BERT+Ranking            | 2022 |  50.6  |  58.0  | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
 |      ArcaneQA (single model)      | 2021 | 57.872 | 64.924 |                            Anonymous                            |
-|    BERT+Ranking (single model)    | 2021 | 50.578 | 57.988 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
-|              ChatGPT              | 2023 | 46.77  |   -    |       [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf)       |
-|   GloVe+Ranking (single model)    | 2021 | 39.521 | 45.136 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
-|                QGG                | 2022 |   -    |  36.7  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|             GPT-3.5v3             | 2023 | 35.43  |   -    |       [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf)       |
-| BERT+Transduction (single model)  | 2021 | 33.255 | 36.803 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
-|             GPT-3.5v2             | 2023 | 30.50  |   -    |       [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf)       |
-|              FLAN-T5              | 2023 | 29.02  |   -    |       [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf)       |
-|               GPT-3               | 2023 | 27.58  |   -    |       [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf)       |
-| GloVe+Transduction (single model) | 2021 | 17.587 | 18.432 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
+|    BERT+Ranking (single model)    | 2021 | 50.578 | 57.988 |         [Gu et al.](https://arxiv.org/abs/2011.07743)          |
+|              ChatGPT              | 2023 | 46.77  |   -    |       [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf)       |
+|   GloVe+Ranking (single model)    | 2021 | 39.521 | 45.136 |         [Gu et al.](https://arxiv.org/abs/2011.07743)          |
+|                QGG                | 2022 |   -    |  36.7  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)        |
+|             GPT-3.5v3             | 2023 | 35.43  |   -    |       [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf)       |
+| BERT+Transduction (single model)  | 2021 | 33.255 | 36.803 |         [Gu et al.](https://arxiv.org/abs/2011.07743)          |
+|             GPT-3.5v2             | 2023 | 30.50  |   -    |       [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf)       |
+|              FLAN-T5              | 2023 | 29.02  |   -    |       [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf)       |
+|               GPT-3               | 2023 | 27.58  |   -    |       [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf)       |
+| GloVe+Transduction (single model) | 2021 | 17.587 | 18.432 |         [Gu et al.](https://arxiv.org/abs/2011.07743)          |
diff --git a/freebase/GrailQA - Zero-shot Generalization.md b/freebase/GrailQA - Zero-shot Generalization.md
@@ -5,18 +5,24 @@
 
 |          Model / System           | Year |   EM   |   F1   |                           Reported by                           |
 | :-------------------------------: | :--: | :----: | :----: | :-------------------------------------------------------------: |
-|               TIARA               | 2022 |  68.0  |  73.9  | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
-|          DeCC(Anonymous)          | 2022 |   -    |  72.5  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|       DECAF (BM25 + FiD-3B)       | 2022 |   -    |  72.3  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|       UniParser (Anonymous)       | 2022 |   -    |  69.8  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|      RnG-KBQA (single model)      | 2021 | 62.988 | 69.182 |       [Ye et. al.](https://arxiv.org/pdf/2109.08678.pdf)        |
-|     DECAF (BM25 + FiD-large)      | 2022 |   -    |  68.0  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|             ArcaneQA              | 2022 |  52.9  |  66.0  | [Shu et. al.](https://aclanthology.org/2022.emnlp-main.555.pdf) |
+|       TIARA + GAIN (T5-3B)        | 2023 |  71.8  |  77.8  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|        Pangu (T5-3B)              | 2023 |  71.6  |  78.5  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|        Pangu (T5-Large)           | 2023 |  71.0  |  78.4  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|      TIARA + GAIN (T5-base)       | 2023 |  69.9  |  76.4  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|        Pangu (BERT-base)          | 2023 |  69.1  |  76.1  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|               TIARA               | 2022 |  68.0  |  73.9  | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf)  |
+|          DeCC(Anonymous)          | 2022 |   -    |  72.5  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)         |
+|       DECAF (BM25 + FiD-3B)       | 2022 |   -    |  72.3  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)         |
+|       UniParser (Anonymous)       | 2022 |   -    |  69.8  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)         |
+|      RnG-KBQA (single model)      | 2021 | 62.988 | 69.182 |       [Ye et al.](https://arxiv.org/pdf/2109.08678.pdf)         |
+|     GPT-3.5-turbo (5-shot)        | 2023 |  61.9  |  67.2  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)        |
+|     DECAF (BM25 + FiD-large)      | 2022 |   -    |  68.0  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)         |
+|             ArcaneQA              | 2022 |  52.9  |  66.0  | [Shu et al.](https://aclanthology.org/2022.emnlp-main.555.pdf)  |
 |        S2QL (single model)        | 2021 | 55.122 | 63.598 |                            Anonymous                            |
 |      ArcaneQA (single model)      | 2021 | 49.964 | 58.844 |                            Anonymous                            |
-|    BERT+Ranking (single model)    | 2021 | 48.566 | 55.660 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
-|      ReTraCk (single model)       | 2021 | 44.561 | 52.539 |   [Chen et. al.](https://aclanthology.org/2021.acl-demo.39/)    |
-|                QGG                | 2022 |   -    |  36.6  |       [Yu et. al.](https://arxiv.org/pdf/2210.00063.pdf)        |
-|   GloVe+Ranking (single model)    | 2021 | 28.886 | 33.792 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
-| BERT+Transduction (single model)  | 2021 | 25.702 | 29.300 |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
-| GloVe+Transduction (single model) | 2021 | 2.968  | 3.123  |         [Gu et. al.](https://arxiv.org/abs/2011.07743)          |
+|    BERT+Ranking (single model)    | 2021 | 48.566 | 55.660 |         [Gu et al.](https://arxiv.org/abs/2011.07743)           |
+|      ReTraCk (single model)       | 2021 | 44.561 | 52.539 |   [Chen et al.](https://aclanthology.org/2021.acl-demo.39/)     |
+|                QGG                | 2022 |   -    |  36.6  |       [Yu et al.](https://arxiv.org/pdf/2210.00063.pdf)         |
+|   GloVe+Ranking (single model)    | 2021 | 28.886 | 33.792 |         [Gu et al.](https://arxiv.org/abs/2011.07743)           |
+| BERT+Transduction (single model)  | 2021 | 25.702 | 29.300 |         [Gu et al.](https://arxiv.org/abs/2011.07743)           |
+| GloVe+Transduction (single model) | 2021 | 2.968  | 3.123  |         [Gu et al.](https://arxiv.org/abs/2011.07743)           |
diff --git a/freebase/GraphQuestions.md b/freebase/GraphQuestions.md
@@ -3,10 +3,19 @@
     datasetUrl: https://github.com/ysu1989/GraphQuestions
 ---
 
-|          Model / System           | Year |   Accuracy   |   F1   |                        Reported by                         |
-|:---------------------------------:|:----:|:------:|:------:|:----------------------------------------------------------:|
-|       ChatGPT                     | 2023 | 53.10   |     -  |       [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf)  |
-|       GPT-3.5v3                   | 2023 | 47.95   |     -  |       [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf)  |
-|       GPT-3.5v2                   | 2023 | 40.85   |     -  |       [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf)  |
-|       GPT-3                       | 2023 | 38.32   |     -  |       [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf)  |
-|       FLAN-T5                     | 2023 | 32.27   |     -  |       [Tan et. al.](https://arxiv.org/pdf/2303.07992.pdf)  |
+|          Model / System           | Year | Accuracy |   F1   |                        Reported by                        |
+|:---------------------------------:|:----:|:--------:|:------:|:---------------------------------------------------------:|
+|       ChatGPT                     | 2023 |  53.10   |     -  |       [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf)  |
+|       GPT-3.5v3                   | 2023 |  47.95   |     -  |       [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf)  |
+|       TIARA + GAIN (T5-3B)        | 2023 |    -     |  48.7  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)  |
+|       TIARA + GAIN (T5-base)      | 2023 |    -     |  45.5  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)  |
+|       GPT-3.5v2                   | 2023 |  40.85   |     -  |       [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf)  |
+|       KB-BINDER                   | 2023 |    -     |  39.5  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)  |
+|       GPT-3                       | 2023 |  38.32   |     -  |       [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf)  |
+|       TIARA (T5-base)             | 2023 |    -     |  37.9  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)  |
+|       FLAN-T5                     | 2023 |  32.27   |     -  |       [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf)  |
+|       ArcaneQA                    | 2023 |    -     |  31.8  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)  |
+|       BERT + Ranking              | 2023 |    -     |  25.0  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)  |
+|       SPARQA                      | 2023 |    -     |  21.5  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)  |
+|       PARA4QA                     | 2023 |    -     |  20.4  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)  |
+|       UDepLambda                  | 2023 |    -     |  17.7  |       [Shu et al.](https://arxiv.org/pdf/2309.08345.pdf)  |