Train floret vectors on OSCAR and compare standard vectors vs. floret vectors on UD Finnish TDT and turku-ner-corpus.
The project.yml
defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
spaCy projects documentation.
The following commands are defined by the project. They
can be executed using spacy project run [name]
.
Commands are only re-run if their inputs have changed.
Command | Description |
---|---|
tokenize-oscar |
Download, tokenize, and sentencize data |
train-fasttext-standard-vectors |
Train standard fasttext vectors |
train-floret-vectors |
Train floret vectors |
init-standard-unpruned-vectors |
Create a standard unpruned vectors model |
init-standard-vectors |
Create a standard vectors model |
init-floret-vectors |
Create a floret vectors model |
convert |
Convert the data to spaCy's format |
train-no-vectors |
Train the model without vectors |
train-standard-unpruned |
Train the model with standard, unpruned vectors |
train-standard |
Train the model with standard, pruned vectors |
train-floret |
Train the model with floret vectors |
evaluate |
Evaluate the models and export metrics |
convert-ner |
Convert the data to spaCy's format |
train-no-vectors-ner |
Train the model without vectors |
train-standard-unpruned-ner |
Train the model with standard, unpruned vectors |
train-standard-ner |
Train the model with standard, pruned vectors |
train-floret-ner |
Train the model with floret vectors |
evaluate-ner |
Evaluate the models and export metrics |
The following workflows are defined by the project. They
can be executed using spacy project run [name]
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.
Workflow | Steps |
---|---|
all |
tokenize-oscar → train-fasttext-standard-vectors → train-floret-vectors → init-standard-unpruned-vectors → init-standard-vectors → init-floret-vectors → convert → train-no-vectors → train-standard-unpruned → train-standard → train-floret → evaluate → convert-ner → train-no-vectors-ner → train-standard-unpruned-ner → train-standard-ner → train-floret-ner → evaluate-ner |
The following assets are defined by the project. They can
be fetched by running spacy project assets
in the project directory.
File | Source | Description |
---|---|---|
assets/UD_Finnish-TDT |
Git | |
assets/turku-ner-corpus |
Git |