TKGT

This is the official code and data repository for the paper TKGT.

Prepare

Before starting, make sure that you have :

Installed vllm in a v-env called vllm and download below interface for fast and parallel inference,

cd /TKGT/Hybird_RAG/predict
git clone [email protected]:jiangpw41/LLMInferenceServer.git

Installed llama-factory in a v-env called llama_factory for fine-tuning.
Installed nltk toolkit in a v-env called tkgt for first-stage and download nltk package under root dir
Installed env2.txt in a v-env called tkgt2 for second stage.
Downloaded a LLM (like chatglm3/4-6b) for Q&A
Downloaded embedding models for embedding and retreving
- "roberta-large" under TKGT/Hybird_RAG/evaluation for bert-score evaluation.
- "sentence-transformers/all-MiniLM-L6-v2" and "cross-encoder/ms-marco-MiniLM-L-6-v2" under TKGT/Hybird_RAG/retriever

Dataset

Our project refers to a total of five datasets, four of which are from previous text-to-table tasks (Rotowire, e2e, wikibio, wikitabletext) and the rest one is proposed by us (the CPL). In addition, the data will be stored in two folders, "raw" and "data", respectively, under the root directory of the project. The former contains four copies of the original data (Rotowire, e2e, wikibio, wikitabletext), the latter includes three processed datasets (CPL, Rotowire, e2e) that can be directly used in this project.

"raw"

We use the processed four datasets by https://github.com/shirley-wu/text_to_table , which can be download at https://drive.google.com/file/d/1zTfDFCl1nf_giX7IniY5WbXi9tAuEHDn/view. You need to download this .zip file, unzip it under project root, and rename it as "raw".

As for the CPL dataset, as the original data involves privacy, we only provide processed dataset in "data".

"data"

There are three datasets in table form stored in the "data" directory from total five, and the remaining two (wikibio, wikitabletext) have no limited table fields. For details, please refer to the readme file in the "data" directory.

Code example

Mixed_IE

The method of TKGT's first stage, which include paradigms of regulation, statistics, and DL.

KGs

The semi-automatically constructed Knowledge Graph of datasets by human experts with the help of Mixed_IE. Based on KGs schema, prepare prompt list and ft data in batch.

Hybird-RAG

Using

rule-based (prior structural knowledge and regular matching) method and
statistics-based (word frequency)
semantic-based (similarity)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
code_example		code_example
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env2.txt		env2.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TKGT

Prepare

Dataset

"raw"

"data"

Code example

Mixed_IE

KGs

Hybird-RAG

About

Releases

Packages

Contributors 2

Languages

License

jiangpw41/TKGT

Folders and files

Latest commit

History

Repository files navigation

TKGT

Prepare

Dataset

"raw"

"data"

Code example

Mixed_IE

KGs

Hybird-RAG

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages