Add coypu-KnowledgeExtraction

semantic-systems · Apr 24, 2024 · 8433fb5 · 8433fb5
1 parent d55e7ba
commit 8433fb5
Show file tree

Hide file tree

Showing 3 changed files with 94 additions and 0 deletions.
diff --git a/docs/coypu-KnowledgeExtraction-1.md b/docs/coypu-KnowledgeExtraction-1.md
@@ -0,0 +1,50 @@
+# Knowledge Extraction module for Coypu project
+## Requirement
+- python>=3.7.0
+- Jave
+- stanford_openie(https://github.com/philipperemy/stanford-openie-python)
+
+## Overview
+This repository is a simple implementation for relation extraction and entity linking on twitter text. 
+The workflow is as folllows:
+Tweet ----> extracted triples -----> link entity to wikidata through wikidata api.
+
+
+### How to run an example?
+```
+pip install stanford_openie
+python ie.py
+```
+Replace the text by the twitter you want to work on:
+extractor.AnnoText('Fire breaks out in Hawaii', save=True) ---> extractor.AnnoText('Your own text', True)
+
+The code will return you the triples from that text in json form.
+
+```json
+{
+    "subject": "Fire",
+    "relation": "breaks out in",
+    "object": "Hawaii",
+    "sublink":
+        {
+            "pid": "P910",
+            "property": "topic's main category",
+            "eid": "Q4992738",
+            "entity": "Category:Fires"
+        },
+    "oblink":
+        {
+            "pid": "P6",
+            "property": "head of government",
+            "eid": "Q469689",
+            "entity": "Neil Abercrombie"
+        }
+}
+```
+
+This result will be saved to triple.json file under the same directory since we give True to save argument.
+
+## Pipeline
+The pipeline of this module is based on following parts:
+1. A txt2graph class, which extract the triples in a given text. (Based on openie)
+2. A entLink class, which links the found entities using the wikidata entity search API.
diff --git a/docs/coypu-KnowledgeExtraction-2.md b/docs/coypu-KnowledgeExtraction-2.md
@@ -0,0 +1,42 @@
+# Knowledge Extraction module for Coypu project MVP2
+## Requirement
+- python>=3.7.0
+- pip install -r requirements.txt
+
+
+Place the model files in the corresponding directory
+- download the model from [drive](https://drive.google.com/drive/folders/1CIkN6opztlhgrkiIOOifocaWtY-PQquL?usp=sharing) you want to use under /mvp2/src/ 
+
+## Overview
+This repository is a pytorch implementation for relation extraction on twitter text. 
+The workflow is as folllows:
+Tweet ----> extracted triples
+
+### How to predict an example?
+Replace the text by the twitter you want to work on and the model checkpoint you want to use in predict.py:
+```
+result = token_classifier("Covid breaks out in Hamburg city of Germany in 2022. ^^")
+model_checkpoint = 'bert-base-uncased-lsoie/checkpoint-64110'
+```
+```
+python predict.py
+```
+
+The code will return you the triples in list form.
+```python
+[['Covid', 'breaks', 'in Hamburg city of Germany'], ['Covid', 'breaks', 'in 2022']]
+```
+
+### How to train?
+Change the variable in In config.py
+```
+dataset="lsoie"/'wnut17'
+model_checkpoint=model from huggingface hub/local direcrtory
+
+python train.py
+```
+## Pipeline
+The pipeline of this module is based on following parts:
+1. Create dataset from lcoal file/cloud
+2. Train the model using trainer api
+3. predict using pipeline
diff --git a/docs/index.md b/docs/index.md
@@ -20,6 +20,8 @@ HITeC/SEMS developed several AI solutions, in particular for Knowledge Extractio
 - [coypu-LlamaKGQA](coypu-LlamaKGQA)
 - [coypu-EventArgumentExtractor](coypu-EventArgumentExtractor)
 - [coypu-crisis-lm](coypu-crisis-lm)
+- [coypu-KnowledgeExtraction: custom (initial) Event Detector and OpenIE Event Argument Extractor](coypu-KnowledgeExtraction-1)
+- [coypu-KnowledgeExtraction: A bert-based baseline event extractor](coypu-KnowledgeExtraction-2)
 
 ## Datasets