-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Patrick Westphal
committed
Apr 24, 2024
1 parent
d55e7ba
commit 8433fb5
Showing
3 changed files
with
94 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# Knowledge Extraction module for Coypu project | ||
## Requirement | ||
- python>=3.7.0 | ||
- Jave | ||
- stanford_openie(https://github.com/philipperemy/stanford-openie-python) | ||
|
||
## Overview | ||
This repository is a simple implementation for relation extraction and entity linking on twitter text. | ||
The workflow is as folllows: | ||
Tweet ----> extracted triples -----> link entity to wikidata through wikidata api. | ||
|
||
|
||
### How to run an example? | ||
``` | ||
pip install stanford_openie | ||
python ie.py | ||
``` | ||
Replace the text by the twitter you want to work on: | ||
extractor.AnnoText('Fire breaks out in Hawaii', save=True) ---> extractor.AnnoText('Your own text', True) | ||
|
||
The code will return you the triples from that text in json form. | ||
|
||
```json | ||
{ | ||
"subject": "Fire", | ||
"relation": "breaks out in", | ||
"object": "Hawaii", | ||
"sublink": | ||
{ | ||
"pid": "P910", | ||
"property": "topic's main category", | ||
"eid": "Q4992738", | ||
"entity": "Category:Fires" | ||
}, | ||
"oblink": | ||
{ | ||
"pid": "P6", | ||
"property": "head of government", | ||
"eid": "Q469689", | ||
"entity": "Neil Abercrombie" | ||
} | ||
} | ||
``` | ||
|
||
This result will be saved to triple.json file under the same directory since we give True to save argument. | ||
|
||
## Pipeline | ||
The pipeline of this module is based on following parts: | ||
1. A txt2graph class, which extract the triples in a given text. (Based on openie) | ||
2. A entLink class, which links the found entities using the wikidata entity search API. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Knowledge Extraction module for Coypu project MVP2 | ||
## Requirement | ||
- python>=3.7.0 | ||
- pip install -r requirements.txt | ||
|
||
|
||
Place the model files in the corresponding directory | ||
- download the model from [drive](https://drive.google.com/drive/folders/1CIkN6opztlhgrkiIOOifocaWtY-PQquL?usp=sharing) you want to use under /mvp2/src/ | ||
|
||
## Overview | ||
This repository is a pytorch implementation for relation extraction on twitter text. | ||
The workflow is as folllows: | ||
Tweet ----> extracted triples | ||
|
||
### How to predict an example? | ||
Replace the text by the twitter you want to work on and the model checkpoint you want to use in predict.py: | ||
``` | ||
result = token_classifier("Covid breaks out in Hamburg city of Germany in 2022. ^^") | ||
model_checkpoint = 'bert-base-uncased-lsoie/checkpoint-64110' | ||
``` | ||
``` | ||
python predict.py | ||
``` | ||
|
||
The code will return you the triples in list form. | ||
```python | ||
[['Covid', 'breaks', 'in Hamburg city of Germany'], ['Covid', 'breaks', 'in 2022']] | ||
``` | ||
|
||
### How to train? | ||
Change the variable in In config.py | ||
``` | ||
dataset="lsoie"/'wnut17' | ||
model_checkpoint=model from huggingface hub/local direcrtory | ||
python train.py | ||
``` | ||
## Pipeline | ||
The pipeline of this module is based on following parts: | ||
1. Create dataset from lcoal file/cloud | ||
2. Train the model using trainer api | ||
3. predict using pipeline |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters