release: v1.0.0

severinsimmler · Jan 6, 2021 · 22039a7 · 22039a7
2 parents 5d5ef54 + bce8633
commit 22039a7
Show file tree

Hide file tree

Showing 28 changed files with 777 additions and 618 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 Linear-chain conditional random fields for natural language processing.
 
-Chaine is a modern Python library without third-party dependencies and a backend written in C. You can train conditional random fields for natural language processing tasks like [named entity recognition](https://en.wikipedia.org/wiki/Named-entity_recognition) or [part-of-speech tagging](https://en.wikipedia.org/wiki/Part-of-speech_tagging).
+Chaine is a modern Python library without third-party dependencies and a backend written in C. You can train conditional random fields for natural language processing tasks like [named entity recognition](https://en.wikipedia.org/wiki/Named-entity_recognition).
 
 - **Lightweight**: No use of bloated third-party libraries.
 - **Fast**: Performance critical parts are written in C and thus [blazingly fast](http://www.chokkan.org/software/crfsuite/benchmark.html).
@@ -14,26 +14,26 @@ You can install the latest stable version from [PyPI](https://pypi.org/project/c
 $ pip install chaine
 ```
 
-If you are interested in the theoretical concepts behind conditional random fields, please refer to the introducing paper by [Lafferty et al](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers).
+Please refer to the introducing paper by [Lafferty et al.](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers) for the theoretical concepts behind conditional random fields.
 
 
-## Example
+## Minimal working example
 
 ```python
 >>> import chaine
->>> tokens = [["John", "Lennon", "was", "born", "in" "Liverpool"]]
+>>> tokens = [["John", "Lennon", "was", "born", "in", "Liverpool"]]
 >>> labels = [["B-PER", "I-PER", "O", "O", "O", "B-LOC"]]
 >>> model = chaine.train(tokens, labels, max_iterations=5)
 >>> model.predict(tokens)
 [['B-PER', 'I-PER', 'O', 'O', 'O', 'B-LOC']]
 ```
 
-Check out the introducing [Jupyter notebook](https://github.com/severinsimmler/chaine/blob/master/notebooks/tutorial.ipynb).
+Check out the [examples](https://github.com/severinsimmler/chaine/blob/master/examples) for a more real-world use case.
 
 
 ## Credits
 
-This library makes use of and is partially based on:
+This project makes use of and is partially based on:
 
 - [CRFsuite](https://github.com/chokkan/crfsuite)
 - [libLBFGS](https://github.com/chokkan/liblbfgs)

diff --git a/chaine/__init__.py b/chaine/__init__.py
@@ -1,2 +1,2 @@
-from chaine.training import train
-from chaine.crf import Model, Trainer
+from chaine.api import train
+from chaine import crf
diff --git a/chaine/api.py b/chaine/api.py
@@ -0,0 +1,150 @@
+"""
+chaine.api
+~~~~~~~~~~
+
+This module implements the high-level API to train a conditional random field
+"""
+
+from chaine.crf import Model, Trainer
+from chaine.typing import Dataset, Labels
+
+
+def train(dataset: Dataset, labels: Labels, **kwargs) -> Model:
+    """Train a conditional random field
+
+    Parameters
+    ----------
+    dataset : Dataset
+        Dataset consisting of sequences of feature sets
+    labels : Labels
+        Labels corresponding to each instance in the dataset
+    algorithm : str
+        Following algorithms are available:
+            * lbfgs: Limited-memory BFGS with L1/L2 regularization
+            * l2sgd: Stochastic gradient descent with L2 regularization
+            * ap: Averaged perceptron
+            * pa: Passive aggressive
+            * arow: Adaptive regularization of weights
+
+    Limited-memory BFGS Parameters
+    ------------------------------
+    min_freq : float, optional (default=0)
+        Threshold value for minimum frequency of a feature occurring in training data
+    all_possible_states : bool, optional (default=False)
+        Generate state features that do not even occur in the training data
+    all_possible_transitions : bool, optional (default=False)
+        Generate transition features that do not even occur in the training data
+    max_iterations : int, optional (default=None)
+        Maximum number of iterations (unlimited by default)
+    num_memories : int, optional (default=6)
+        Number of limited memories for approximating the inverse hessian matrix
+    c1 : float, optional (default=0)
+        Coefficient for L1 regularization
+    c2 : float, optional (default=1.0)
+        Coefficient for L2 regularization
+    epsilon : float, optional (default=1e-5)
+        Parameter that determines the condition of convergence
+    period : int, optional (default=10)
+        Threshold value for iterations to test the stopping criterion
+    delta : float, optional (default=1e-5)
+        Top iteration when log likelihood is not greater than this
+    linesearch : str, optional (default="MoreThuente")
+        Line search algorithm used in updates:
+            * MoreThuente: More and Thuente's method
+            * Backtracking: Backtracking method with regular Wolfe condition
+            * StrongBacktracking: Backtracking method with strong Wolfe condition
+    max_linesearch : int, optional (default=20)
+        Maximum number of trials for the line search algorithm
+
+    SGD with L2 Parameters
+    ----------------------
+    min_freq : float, optional (default=0)
+        Threshold value for minimum frequency of a feature occurring in training data
+    all_possible_states : bool, optional (default=False)
+        Generate state features that do not even occur in the training data
+    all_possible_transitions : bool, optional (default=False)
+        Generate transition features that do not even occur in the training data
+    max_iterations : int, optional (default=None)
+        Maximum number of iterations (1000 by default)
+    c2 : float, optional (default=1.0)
+        Coefficient for L2 regularization
+    period : int, optional (default=10)
+        Threshold value for iterations to test the stopping criterion
+    delta : float, optional (default=1e-5)
+        Top iteration when log likelihood is not greater than this
+    calibration_eta : float, optional (default=0.1)
+        Initial value of learning rate (eta) used for calibration
+    calibration_rate : float, optional (default=2.0)
+        Rate of increase/decrease of learning rate for calibration
+    calibration_samples : int, optional (default=1000)
+        Number of instances used for calibration
+    calibration_candidates : int, optional (default=10)
+        Number of candidates of learning rate
+    calibration_max_trials : int, optional (default=20)
+        Maximum number of trials of learning rates for calibration
+
+    Averaged Perceptron Parameters
+    ------------------------------
+    min_freq : float, optional (default=0)
+        Threshold value for minimum frequency of a feature occurring in training data
+    all_possible_states : bool, optional (default=False)
+        Generate state features that do not even occur in the training data
+    all_possible_transitions : bool, optional (default=False)
+        Generate transition features that do not even occur in the training data
+    max_iterations : int, optional (default=None)
+        Maximum number of iterations (100 by default)
+    epsilon : float, optional (default=1e-5)
+        Parameter that determines the condition of convergence
+
+    Passive Aggressive Parameters
+    -----------------------------
+    min_freq : float, optional (default=0)
+        Threshold value for minimum frequency of a feature occurring in training data
+    all_possible_states : bool, optional (default=False)
+        Generate state features that do not even occur in the training data
+    all_possible_transitions : bool, optional (default=False)
+        Generate transition features that do not even occur in the training data
+    max_iterations : int, optional (default=None)
+        Maximum number of iterations (100 by default)
+    epsilon : float, optional (default=1e-5)
+        Parameter that determines the condition of convergence
+    pa_type : int, optional (default=1)
+        Strategy for updating feature weights:
+            * 0: PA without slack variables
+            * 1: PA type I
+            * 2: PA type II
+    c : float, optional (default=1)
+        Aggressiveness parameter (used only for PA-I and PA-II)
+    error_sensitive : bool, optional (default=True)
+        Include square root of predicted incorrect labels into optimization routine
+    averaging : bool, optional (default=True)
+        Compute average of feature weights at all updates
+
+    Adaptive Regularization of Weights (AROW) Parameters
+    ----------------------------------------------------
+    min_freq : float, optional (default=0)
+        Threshold value for minimum frequency of a feature occurring in training data
+    all_possible_states : bool, optional (default=False)
+        Generate state features that do not even occur in the training data
+    all_possible_transitions : bool, optional (default=False)
+        Generate transition features that do not even occur in the training data
+    max_iterations : int, optional (default=None)
+        Maximum number of iterations (100 by default)
+    epsilon : float, optional (default=1e-5)
+        Parameter that determines the condition of convergence
+    variance : float, optional (default=1)
+        Initial variance of every feature weight
+    gamma : float, optional (default=1)
+        Trade-off between loss function and changes of feature weights
+
+    Returns
+    -------
+    Model
+        A conditional random field trained on the dataset
+    """
+    # initialize trainer and start training
+    trainer = Trainer(**kwargs)
+    trainer.train(dataset, labels, "model.crf")
+
+    # load and return the trained model
+    return Model("model.crf")