Skip to content

Scripts for reproducing experiment one documented in Rainsford and Regnault (2023)

Notifications You must be signed in to change notification settings

rainsfordtm/chr2023-exp1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parser evaluation for experiment 1 in CHR2023 poster (Rainsford and

Regnault 2023)

1. Introduction

The parser evaluation in experiment 1 of Rainsford and Regnault (2023) took place in several stages and made use of the Concordance Manager (Rainsford 2023) to combine and query annotations. This README file summarizes the procedure adopted and the principal files in this repository but does not provide comprehensive documentation of an experimental process which will be further developed in the framework of an ongoing project.

2. Objective

The objective was to produced concordances of two French verbs (jeter 'to throw' and entrer 'to enter') containing correct annotation of the following features of argument structure:

  • reflexive: (y or n) presence or absence of a reflexive pronoun
  • en_clitic: (y or n) presence or absence of a en clitic pronoun
  • obj_acc: (string) the direct object, if present
  • loc: (string) locative arguments and adjuncts, if present
  • loc_lemma: (string) head of locative arguments and adjuncts
  • loc_type: (string) whether the head of the locative is transitive (i.e. takes an NP complement) or intransitive
  • predicate_type: (string) nature of the verb form

The data was extracted from a corpus containing only automatic lemmatization and part-of-speech tags. The objective was to automatically generate the features of argument structure listed above using a combination of parsers trained on pre-existing models and "expert queries", whose purpose was to enrich and correct the annotation produced by the parser.

3. Procedure

  1. Extraction of all the occurrences of the verbs jeter and entrer found in the public domain texts of FRANTEXT in the form of a concordance.
  2. ConMan (first pass)
  3. Manual correction of initial annotation to produce "gold" annotation
  4. Parsing of the CONLL-U file with three different parser-model combinations:
  5. ConMan (second pass)
  6. ConMan (third pass)
  7. Optimize the query script to correct common errors from the best parser / model combination (HOPS/Sequoia) and improve annotation of the targeted phenomena ("expert queries")
  8. ConMan (fourth pass)
  9. ConMan (fifth pass)

4. Running the annotation scripts

The annotation scripts are integrated into the ConMan software which can be installed from the git repository:

git clone https://github.com/rainsfordtm/conman.git

Here's how to use the scripts on the data provided in this repository (examples for running queries on entrer using HOPS Sequoia):

Pass 2. First, update the path in parser-eval/cfg/wf_pass2_entrer.cfg so that it points to parser-eval/py/frantext-parser-annotate-basic.py (Conman requires absolute paths to annotation scripts). Then run

<CONMAN_PATH>/conman.py -s -z -w parser-eval/cfg/wf_pass2_entrer.cfg -m parser-eval/conllu/01parsed-hops-sequoia-flaubert/entrer.conllu parser-eval/cnc/00extracted/entrer.cnc.gz <OUTPUT_FILE>

Pass 3. First, update the path in parser-eval/cfg/wf_pass3_pass5_eval_entrer.cfg so that it points to parser-eval/py/frantext-checked-eval.py (Conman requires absolute paths to annotation scripts). Then run

<CONMAN_PATH>/conman.py -w parser-eval/cfg/wf_pass3_pass5_entrer.cfg -m parser-eval/csv/eval-gold/entrer.csv parser-eval/cnc/01parsed-hops-sequoia-flaubert-basic/entrer.cnc.gz <OUTPUT_FILE>

Pass 4 First, update the path in parser-eval/cfg/wf_pass4_entrer.cfg so that it points to parser-eval/py/frantext-parser-annotate.py (Conman requires absolute paths to annotation scripts). Then run

<CONMAN_PATH>/conman.py -s -z -w parser-eval/cfg/wf_pass4_entrer.cfg -m parser-eval/conllu/01parsed-hops-sequoia-flaubert/entrer.conllu parser-eval/cnc/00extracted/entrer.cnc.gz <OUTPUT_FILE>

Pass 5. First, update the path in parser-eval/cfg/wf_pass3_pass5_eval_entrer.cfg so that it points to parser-eval/py/frantext-checked-eval.py (Conman requires absolute paths to annotation scripts). Then run

<CONMAN_PATH>/conman.py -w parser-eval/cfg/wf_pass3_pass5_entrer.cfg -m parser-eval/csv/eval-gold/entrer.csv parser-eval/cnc/01parsed-hops-sequoia-flaubert-expert/entrer.cnc.gz <OUTPUT_FILE>

References

About

Scripts for reproducing experiment one documented in Rainsford and Regnault (2023)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published