Skip to content

Commit

Permalink
Add coypu-KnowledgeExtraction
Browse files Browse the repository at this point in the history
  • Loading branch information
Patrick Westphal committed Apr 24, 2024
1 parent d55e7ba commit 8433fb5
Show file tree
Hide file tree
Showing 3 changed files with 94 additions and 0 deletions.
50 changes: 50 additions & 0 deletions docs/coypu-KnowledgeExtraction-1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Knowledge Extraction module for Coypu project
## Requirement
- python>=3.7.0
- Jave
- stanford_openie(https://github.com/philipperemy/stanford-openie-python)

## Overview
This repository is a simple implementation for relation extraction and entity linking on twitter text.
The workflow is as folllows:
Tweet ----> extracted triples -----> link entity to wikidata through wikidata api.


### How to run an example?
```
pip install stanford_openie
python ie.py
```
Replace the text by the twitter you want to work on:
extractor.AnnoText('Fire breaks out in Hawaii', save=True) ---> extractor.AnnoText('Your own text', True)

The code will return you the triples from that text in json form.

```json
{
"subject": "Fire",
"relation": "breaks out in",
"object": "Hawaii",
"sublink":
{
"pid": "P910",
"property": "topic's main category",
"eid": "Q4992738",
"entity": "Category:Fires"
},
"oblink":
{
"pid": "P6",
"property": "head of government",
"eid": "Q469689",
"entity": "Neil Abercrombie"
}
}
```

This result will be saved to triple.json file under the same directory since we give True to save argument.

## Pipeline
The pipeline of this module is based on following parts:
1. A txt2graph class, which extract the triples in a given text. (Based on openie)
2. A entLink class, which links the found entities using the wikidata entity search API.
42 changes: 42 additions & 0 deletions docs/coypu-KnowledgeExtraction-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Knowledge Extraction module for Coypu project MVP2
## Requirement
- python>=3.7.0
- pip install -r requirements.txt


Place the model files in the corresponding directory
- download the model from [drive](https://drive.google.com/drive/folders/1CIkN6opztlhgrkiIOOifocaWtY-PQquL?usp=sharing) you want to use under /mvp2/src/

## Overview
This repository is a pytorch implementation for relation extraction on twitter text.
The workflow is as folllows:
Tweet ----> extracted triples

### How to predict an example?
Replace the text by the twitter you want to work on and the model checkpoint you want to use in predict.py:
```
result = token_classifier("Covid breaks out in Hamburg city of Germany in 2022. ^^")
model_checkpoint = 'bert-base-uncased-lsoie/checkpoint-64110'
```
```
python predict.py
```

The code will return you the triples in list form.
```python
[['Covid', 'breaks', 'in Hamburg city of Germany'], ['Covid', 'breaks', 'in 2022']]
```

### How to train?
Change the variable in In config.py
```
dataset="lsoie"/'wnut17'
model_checkpoint=model from huggingface hub/local direcrtory
python train.py
```
## Pipeline
The pipeline of this module is based on following parts:
1. Create dataset from lcoal file/cloud
2. Train the model using trainer api
3. predict using pipeline
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ HITeC/SEMS developed several AI solutions, in particular for Knowledge Extractio
- [coypu-LlamaKGQA](coypu-LlamaKGQA)
- [coypu-EventArgumentExtractor](coypu-EventArgumentExtractor)
- [coypu-crisis-lm](coypu-crisis-lm)
- [coypu-KnowledgeExtraction: custom (initial) Event Detector and OpenIE Event Argument Extractor](coypu-KnowledgeExtraction-1)
- [coypu-KnowledgeExtraction: A bert-based baseline event extractor](coypu-KnowledgeExtraction-2)

## Datasets

Expand Down

0 comments on commit 8433fb5

Please sign in to comment.