From 81b2342f14ad2d3f2006aea2725390c9a8eabc84 Mon Sep 17 00:00:00 2001 From: Luca Foppiano Date: Sun, 4 Feb 2024 11:04:21 +0900 Subject: [PATCH] update documentation --- README.md | 34 ++------------------------ doc/evaluation-scores.rst | 49 ++++++++++++++++++++++++++++---------- doc/gettingStarted.rst | 50 +++++++++++++++++++++++++++++++++++---- 3 files changed, 83 insertions(+), 50 deletions(-) diff --git a/README.md b/README.md index dc1ac6bf..2885262c 100644 --- a/README.md +++ b/README.md @@ -37,39 +37,9 @@ Spaces: https://lfoppiano-grobid-quantities.hf.space/ ## Latest version -The latest released version of grobid-quantities -is [0.7.3](https://github.com/kermitt2/grobid-quantities/releases/tag/v0.7.3). The current development version is -0.7.4-SNAPSHOT. +The latest released version of grobid-quantities is [0.7.3](https://github.com/kermitt2/grobid-quantities/releases/tag/v0.7.3). The current development version is 0.7.4-SNAPSHOT. +**Important**: to upgrade please check [here](https://grobid-quantities.readthedocs.io/gettingStarted.html#upgrade). -### Update from 0.7.2 to 0.7.3 - -#### Grobid models -In version 0.7.3 we have updated the DeLFT models. The DL models must be updated by running `./gradlew copyModels`. - -#### JDK Update -The version 0.7.3 enable the support for running with JDK > 11. We recommend to run it with JDK 17. -Running grobid-quantities with gradle (`./gradlew clean run`) is already supported in the `build.gradle`. -Running grobid-quantities via the JAR file requires an additional parameter to set the java.path: -- Linux: `-Djava.library.path=../grobid-home/lib/lin-64:../grobid-home/lib/lin-64/jep` -- Mac (arm): `-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac_arm-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED` -- Mac (intel): `-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED` - With `MY_VIRTUAL_ENV` I use `/Users/lfoppiano/anaconda3/envs/jep` - - -### Update from 0.7.1 to 0.7.2 - -In version 0.7.2 we have updated the DeLFT models. -The DL models must be updated by running `./gradlew copyModels`. - -### Update from 0.7.0 to 0.7.1 - -In version 0.7.1 a new version of DeLFT using Tensorflow 2.x is used. -The DL models must be updated by running `./gradlew copyModels`. - -### Update from 0.6.0 to 0.7.0 - -In version 0.7.0 the models have been updated, therefore is required to run a `./gradlew copyModels` to have properly -results especially for what concern the unit normalisation. ## Documentation diff --git a/doc/evaluation-scores.rst b/doc/evaluation-scores.rst index 9998f590..165215d4 100644 --- a/doc/evaluation-scores.rst +++ b/doc/evaluation-scores.rst @@ -1,8 +1,34 @@ .. topic:: Evaluation scores -***************** -Evaluation scores -***************** +********** +Evaluation +********** + +-------------------- +End 2 end evaluation +-------------------- + +The end-to-end evaluation was performed with the `MeasEval dataset `_ (SemEval-2021 Task 8). +The scores in the following table are the micro average. +MeasEval was annotated to allow approximated entities, which are not supported in grobid-quantities. + ++---------------------------+----------------+-----------+--------+---------+---------+ +| Type (Ref) | Matching method| Precision | Recall | F1-score| Support | ++===========================+================+===========+========+=========+=========+ +| Quantities (QUANT) | strict | 53.05 | 54.74 | 53.88 | 1165 | ++---------------------------+----------------+-----------+--------+---------+---------+ +| Quantities (QUANT) | soft | 64.64 | 66.70 | 65.65 | 1165 | ++---------------------------+----------------+-----------+--------+---------+---------+ +| Quantified substance (ME) | strict | 14.03 | 9.78 | 11.53 | 613 | ++---------------------------+----------------+-----------+--------+---------+---------+ +| Quantified substance (ME) | soft | 21.53 | 15.02 | 17.69 | 613 | ++---------------------------+----------------+-----------+--------+---------+---------+ + +Note: the ME (Measured Entity) is still experimental in Grobid-quantities + +------------------------------------------------------- +Machine Learning Named Entities Recognition Evaluation +------------------------------------------------------- The scores (P: Precision, R: Recall, F1: F1-score) for all the models are performed either as 10-fold cross-validation or using an holdout dataset. The holdout dataset of Grobid-quantities is composed by the following examples: @@ -18,14 +44,14 @@ The models are organised as follow: - BERT_CRF is a BERT-based model obtained by fine-tuning a SciBERT encoder. Like others, the activation function is composed by a CRF layer. -======================= + Results from 27/10/2022 -======================= +~~~~~~~~~~~~~~~~~~~~~~~ The evaluation was performed on the holdout dataset from the grobid-quantities dataset. Average values are computed as Micro average. ----------- + Quantities ---------- @@ -79,7 +105,6 @@ Quantities +------------------+--------------+--------+---------+-------------------------+--------+---------+ ------ Units ----- @@ -113,7 +138,7 @@ Units were evaluated using UNISCOR dataset. For more information check the secti | All (micro avg) | 70.19 | 60.88 | 65.20 | 73.03 | 65.31 | 68.94 | +------------------+--------------+--------+---------+-------------------------+--------+---------+ ------- + Values ------ @@ -150,9 +175,9 @@ Values | All (micro avg) | 98.90 | 99.17 | 99.03 | 98.86 | 99.25 | 99.05 | +-----------------+------------+--------+----------+-------------------------+---------+----------+ -================ + Previous results -================ +~~~~~~~~~~~~~~~~ The scores of this evaluation were obtained using n-fold cross-validation. The metrics are the micro average of n=10 folds. @@ -163,7 +188,7 @@ Evaluation notes: - The `CRF` model was evaluated on the 30/04/2020. - The `BidLSTM_CRF_FEATURES` model was evaluated on the 28/11/2021 ----------- + Quantities ---------- @@ -191,7 +216,6 @@ Quantities | All (micro avg) | 88.96 | 85.40 | 87.14 | 87.23 | 89.00 | 88.10 | +---------------------+------------+--------+----------+----------------------+--------+----------+ ------ Units ----- @@ -212,7 +236,6 @@ CRF was updated on the 10/02/2021 +------------------+------------+--------+----------+-----------+-------+-----------+ ------- Values ------ diff --git a/doc/gettingStarted.rst b/doc/gettingStarted.rst index 31d8f927..db68648a 100644 --- a/doc/gettingStarted.rst +++ b/doc/gettingStarted.rst @@ -7,17 +7,57 @@ .. _latest discussion: https://github.com/kermitt2/grobid/issues/1014 - +############### Getting started -=============== +############### -Before you start -~~~~~~~~~~~~~~~~ .. warning:: Grobid and grobid-quantities are `not compatible with Windows`_ and limited on Apple M1. While Windows users can easily use Grobid and grobid-quantities through docker containers, the support for grobid on ARM is under development, see the `latest discussion`_. .. warning:: Since grobid-quantities 0.7.3 (using grobid 0.7.3), we extended the support to JDK after version 11. This requires specifying the `java.library.path` explicitly. Obviously, *all these issues are solved by using Docker containers*. +Upgrade +~~~~~~~ + +0.7.2 to 0.7.3 +============== + +Grobid models +------------- + +In version 0.7.3, we have updated the DeLFT models. The DL models must be updated by running ``./gradlew copyModels``. + +JDK Update +----------- + +The version 0.7.3 enables the support for running with JDK > 11. We recommend running it with JDK 17. +Running grobid-quantities with gradle (``./gradlew clean run``) is already supported in the ``build.gradle``. +Running grobid-quantities via the JAR file requires an additional parameter to set the java.path: + +- Linux: ``-Djava.library.path=../grobid-home/lib/lin-64:../grobid-home/lib/lin-64/jep`` +- Mac (arm): ``-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac_arm-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED`` +- Mac (intel): ``-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED`` + With ``MY_VIRTUAL_ENV`` I use ``/Users/lfoppiano/anaconda3/envs/jep`` + +0.7.1 to 0.7.2 +============== + +In version 0.7.2, we have updated the DeLFT models. +The DL models must be updated by running ``./gradlew copyModels``. + +0.7.0 to 0.7.1 +============== + +In version 0.7.1, a new version of DeLFT using Tensorflow 2.x is used. +The DL models must be updated by running ``./gradlew copyModels``. + +0.6.0 to 0.7.0 +============== + +In version 0.7.0, the models have been updated, therefore it is required to run a ``./gradlew copyModels`` to have properly +results, especially for what concerns the unit normalization. + + Install and build ~~~~~~~~~~~~~~~~~ @@ -25,7 +65,7 @@ Docker containers ----------------- The simplest way to run grobid-quantities is via docker containers. -The Grobid-quantities repository provides a configuration file for docker: `resources/config/config-docker.yml`, which should work out of the box, although we recommend to **check the configuration** (e.g., to enable modules using deep learning). +The Grobid-quantities repository provides a configuration file for docker: ``resources/config/config-docker.yml``, which should work out of the box, although we recommend to **check the configuration** (e.g., to enable modules using deep learning). To run the container use: ::