-
Notifications
You must be signed in to change notification settings - Fork 12
Home
This page is a short guide to the Arborator, a collaborative dependency annotation software.
It contains information for
- annotators
- validators (the referees between different annotations that decide on the final annotation to be included in the corpus)
- arborator administrators, who assign the annoations and validation tasks
- and site administrators (for installation of the software on an apache web server)
Most functions are accessible by keyboard and mouse:
- TAB - shows next sentence. if not yet loaded, loads it. loops through sentences
- SPACE - edit next word. if no word handeled yet, opens first word, loops through words
- BACKSPACE - edit previous word. if no word handeled yet, opens last word, loops through words
- RETURN - accepts the changed head, function, or category. if SHIFT is held down and a new head is proposed, the existing link is preserved
- ESC - stops editing, closes open menus
- CURSOR UP or DOWN - if editing: open function menu, if moreover SHIFT or CTRL pressed: open category menu
- "c" - if editing: open category menu
- "f" - if editing open function menu
- in open menus: first letter of function or category loops through the funcs/cats with that letter
- ctrl-s - save
- pull governor on dependent token creates link, function menu opens. if shift is held down, existing link is preserved
- pull word to the top until the line changes color and let go to create a root link
- click on function name or category opens corresponding menus
- doubleclick on token: open all feature table
each annotator can assign a status to each tree:
- no tree if no tree has been saved by the user yet. clicking on the "no tree" label saves the currently visible tree
- todo
- ok
- problem
the graph of a tree can be exported in
- ps
- odg
- jpg
- png
- tiff
- svg
- validators and administrators have access to the compare mode (if there are two different annotation of the sentence):
- the fruit icon opens a list of the annotations for the sentence.
- different annotations of the given sentence can be checked
- clicking on the fruit icon again gives a graphical representation of the different annotations of the sentence.
- wrong links in the unified tree can be erased and the corrected tree can be saved.
Administrators see a special button, a chain symbol, at the far right of the buttons for each sentence. This allows for
- connecting the sentence with the following sentence into one line. A simple click on the chain button opens an approval dialog.
- splitting the sentence at any position into two separate lines. To do that, the administrator selects the word after which he or she wants to split the sentence. The word is then color highlighted. Then, a Ctrl-click on the chain button, opens an approval dialog.
- shows
- the assigned texts
- the annotator can change the text status by clicking on the default status (todo)
- the default status tags are todo, ok, and problem
- The status can be seen by administrators on the project page.
- all the texts of the database and their annotators and validators
- administrators additionally see a list of all the assignments per user
- the assigned texts
- allows searching for words, functions, and other features in the database. See section "query"
- allows for different types of conll export:
- by assignment: the text is exported with all its sentences, a file per user. i.e. all sentences of a text are exported even if the annotator did not save them as his or her own. in that case, the parser's trees are taken
- by existing trees: all saved trees for a given text are exported, a file per user, non-annotated sentences of a text do not appear in the file.
- administrators can
- assign annotators and validators: to assign a person,
- click on "assign", choose the person,
- click on "+" to assign a simple annotator (can't see the other annotators' annotations)
- click on "✓" to assign a validator (can see the other annotators' annotations)
- click on "assign", choose the person,
- erase texts
- add texts, see below
- assign annotators and validators: to assign a person,
- Click on "Add files to the database".
- Above: Upload a new file to the site, Below: list of already uploaded file containing syntactic analyses (CoNLL)
- To upload: Click on "Browse", choose the file, "OK", then "Upload"
- accepted file formats: Malt, CoNLL 10 (including orféo, ie. CoNLL 10+3), CoNLL 14
- To upload: Click on "Browse", choose the file, "OK", then "Upload"
- To include the file in the project: Click on the "^" button. If all goes well, the file is added to the database. Currently you have to refresh the Project page (F5) to see it.
- teacher visible mode: dumb exercise where students have to copy the teacher's tree which is visible but not directly modifiable. a good start with 3 sentences or so.
- no feedback: the student can't see the teacher's tree and gets no feedback, but the admin can export the results of the students' annotations compared to the teacher trees
- percentage: when students save, they can see how many percent they got wrong of dependencies and pos, but they don't know where
- graphical feedback: when students save, they can see where there are problems compared to the teacher's tree and they have to find the right annotation.
The Arborator allows for queries in the database. In the field in the above right corner of the project page, google-like queries can be carried out: space separated query terms and quotes around multiple words to search for the whole string including spaces, AND (default), OR, *, ... Hit "Enter" and the system will give back a list of results with links to the corresponding sentence (with snippets of sentences containing all the query terms if no features were used). Note that feature searches are much slower as they are not precompiled.
- Feature queries: Colon-separated attribute value pairs can be included like for example cat:N. "func" or "function" can be used to access function names. These are valid queries:
- 'agréable cat:I func:para_disfl'
- 'lemma:pouvoir func:fixed tag:NOUN journée'
put 'mate = 1' In the config file. Then admins see a mate box on the project page and clicking it will take the validated trees and adds a tree for every sentence annotated by the "mate" user
Simple steps: Say you have a annotated example treebank and a simple text file you want to parse. On the server:
- Create a new project (new folder in projects of name XXX)
- ' cd lib ' and then ' python createDB.py XXX'
- Click on "Add files to the database" to upload treebank
- Click on "Add files to the database" , then check "A sentence per line" to upload text file (one sentence per line, no empty lines!) ignore weird json, go back on step, refresh
- attribute the treebank sample to a user as validator, the user has to change the status from todo to ok
- check "all validated trees" in the mate box
- wait before admiring the parse results.
- On the bottom of each page are links to logout and to edit the user account. The page provides
- information on passed site access
- change of password and real name
- Administrators additionally have a link for user administration which allows
- To create or invite new users
- To edit or delete users
- To edit the main config file and default user
- Users can be simple annotators or validators
- Annotators can only see their own trees and the trees by users specified in the project.cfg (generally the trees by parser)
- Validators can see all the trees on the texts for which they are assigned as validators and can use the compare tools.
- Admins are declared in the user admin pages. They should obtain the "Admin Level" of 1 (only the site admin should have 3). Admins can
- attribute annotation validation tasks and
- see all the trees
- upload conll files
- erase texts
- export texts as conll
The software is written in Python (server side) and Javascript (client side). It is released under the GNU Affero GPL v3 licence. The latest version can be obtained on the Arborator's launchpad page.
To install the Arborator, the whole source dump has to be unzipped to a folder on an apache server.
Arborator uses SQLite, and needs this to be installed just as the corresponding python sqlite module (standard in recent versions of Python). The tree transformation for the import options also use two non-standard modules: nltk.featstruct (you'll have to install nltk, on ubuntu, that's done with sudo apt-get install python-nltk) and jellyfish (for fuzzy matching when importing, install by typing sudo pip install jellyfish). All other used python modules are standard: difflib, generators, glob, hashlib, json, optparse, os, random, re, shutil, sqlite3, subprocess, sys, time, traceback, urllib, xml.dom
The Arborator includes various open source scripts and software:
- Javascript tools
- JQuery
- JQueryUI
- Raphael
- JQuery.fileupload
- Python tools
- logintools
- Java tools
- Batik
- svg2office
either the whole lamp (if you want to use php or mysql):
sudo apt-get install tasksel sudo tasksel install lamp-server
or simply apache alone:
sudo apt-get install apache2
try to visit http://localhost in your browser
if it doesn't connect try
sudo /etc/init.d/apache2 restart
suppose you downloaded arborator into /home/me/arborator (i.e. the file index.cgi for example is in this folder)
sudo ln -s /home/me/arborator /var/www/arborator
make the folder accessible and writable for everyone (not a good idea for a public server!):
sudo chmod -R a+rw /home/me/arborator
check whether all .cgi files are executable. if not:
sudo find /home/arborator -name "*.cgi" -exec chmod +x {} \;
now look into /etc/apache2/sites-enabled/ there should be the default server: 000-default
open this file with your favorite editor, for example with dolphin
the information about the /var/www directory should look like this:
<Directory /var/www/> Options Indexes FollowSymLinks MultiViews Options +FollowSymLinks Options +ExecCGI AddHandler cgi-script .cgi AllowOverride None Order allow,deny allow from all </directory>
change it, save it, restart apache:
sudo /etc/init.d/apache2 restart
now going on http://localhost/arborator in your browser should show the arborator start page
- create a folder with the name of the project in the projects directory
- copy project.cfg from an example project and edit it:
- this includes giving the list of categories and functions in separate files. In the editor, functions and categories not in the list will be shown in grey and cannot be assigned.
- each function and each category can be followed by a tab and a json/css description of how the arrows and categories should look like respectively
- make all new folders world read and writable: the projects folder, the export/cache folder, the user folder, and all their subfolders should be world read-writable. if it is not , you could change the mode by :
chmod 777 files_or_folder
If you edit anything in the users folder, don't leave temporary folders (...ini~).
- to create a new database, go to the project directory (in arborator/projects) and run:
python ../lib/createDB.py name_of_project
- an image file name_of-the_project.png can be placed in the project folder. this will be shown on the start page.
- you should not have to change anything outside of the projects folder.
- only exception: you may want to add (rhapsodie xml or conll) files into the corpus folder, instead of manually uploading each file individually.
- Please make sure all your project folder is writable ,
In a usual setting, the data is automatically preparsed and only corrected by the annotators. CoNLL (Malt, 10 or 14) files as well as Rhapsodie XML files can be uploaded into the database from the project page. For the moment, all existing annotations on texts of the same name are erased when uploading a new file!!!
- you can also enter the CoNLL or XML files manually into the database (using a python script) instead of clicking on each link on the site:
- If users get their own login by signing up, they have to log on to confirm. Only then, the user is added to the database (and thus only then the user can be assigned an annotation task). In case of automatic creation of users (or manual creation by adding ini files), click once on User Administration / edit users so that all users are also included in the database.
- Another way if you want enter a data file like .conllu , you can follow this exemple in function main:
from database import SQL trees=conll.conllFile2trees("../projects/yourproject/export/yourexemple.conllu") simpleEnterSentences(SQL("yourproject"),trees,"yourexemple", "parser", eraseAllAnnos=True)
- Attention:
- Make sure that you have imported SQL
- If you did the option like: eraseAllAnnos=True, the ealier annotations named "yourexemple" will be erased.
The database is a file called arborator.db.sqlite. It's located in the project folder.
- before doing any works on the databases, warn potential annotators by changing the name of the file xsitemessage.html into sitemessage.html
- make a backup copy of your database (simply copy the file somewhere else)
the distribution contains a script called bulkCorrectDatabase.py
it allows to run over the whole database and correct features coherently. however, it's very slow. usually, a direct access to the database is faster.
- the function evoked in is bulkcorrectDB.
- it can be called with a list of treeids: bulkcorrectDB("Rhapsodie", [9795])
- or without it: bulkcorrectDB("Rhapsodie")
- non destructive upload: integrate users's annotation without deleting existing annotations, even if tokens are slightly different (diff)
- mode comparaison: referee, cohen's kappa
- importation of a selection of features from the rhapsodie xml format.
- special save button for validators in order to keep their original annotation if they are also annotators.
- graphical editor:
- little problems with the undo manager. e.g. if menu opens and the same func/cat is chosen, don't register as dirty, something is wrong after saving.undo/redo accesssibility is different from dirty/clean: after save, undo/redo should remain accessible (ok)
- check the simple graphical viewer:
- put ad hoc colors back in again (currently: project's colors)
- upload page: make trash file work, make rhapsodie xml work
- make tiny and fast js compilation
- redesign: make project choice cookie based (and not form based)