Skip to content

Generalizing Knowledge Acquisition with unsupervised automatic labeling for Special Cargo Domain.

Notifications You must be signed in to change notification settings

yoseflaw/KArgen

Repository files navigation

KArgen - Knowledge Acquisition Generalization with Multi-task Learning

KArgen is the generalization implementation for my Master's Thesis:

Automatic Knowledge Acquisition for the Special Cargo Services Domain with Unsupervised Entity and Relation Extraction

Code structure adopted from: anago

The generalization part provides a model that can be used for entity/relation extraction from special cargo text. The training set was created automatically via KArgo. The model architecture can be seen here:

Simplified HMTL Architecture

This repository contains the following folders:

  • data/kargo: all datasets for NER/EE/RE in CONLL format. Multi-task modeling as proposed by Bekoulis et al. (2018).
    • train: training sets as produced by KArgo
      • not_terms_only: dataset contains all sentences, including sentences without entities (for EE)
      • terms_only: dataset contains only sentences with at least one entity (for EE)
    • dev_rel, test_rel: development and test set 1
    • online_rel: test set 2 (online documents, based on HTML/PDF excerpts)
  • kargen: source code folder for KArgen
    • crf.py: CRF layer implementation for Keras, based on keras-contrib
    • models.py: model structure and wrapper for simplified Hiearchical Multi-task Learning from hmtl
    • preprocessing.py: preprocessing pipeline for sequential deep learning model
    • trainer.py: training routine for KArgen model, including callbacks.
  • main.py: example of KArgen training and evaluation routine, including saving/loading models.
  • infer.ipynb: example of extraction with the trained models, visualization with displaCy
  • results.ipynb: notebook for visualizing model training/evaluation results, can be seen here

Result Training Visualization

A comparison of Precision/Recall/F-score for model trained with automatic training set (Auto) and development set (Manual), for test set 1 (holdout news articles):

Result Test Set 1 News

and for test set 2 (online documents):

Result Test Set 1 News

About

Generalizing Knowledge Acquisition with unsupervised automatic labeling for Special Cargo Domain.

Topics

Resources

Stars

Watchers

Forks