-
Notifications
You must be signed in to change notification settings - Fork 16
ARIA Framework Tutorial
Go to the folder of the dialogue manager, which is located at ..\ARIA\Agent-Core-Final\
. In the resources folder you can find the configuration files.
-
ARIAFlipper.xml
: This is for logging purposes -
aria.properties
: Here you can define the templates you want to use in the dialogue and other properties for Flipper -
ariaQAM.properties
: Setting the file for question-answering and what default answers to give if the agent cannot retrieve an answer. -
nlg.properties
: A configuration file that was for verbal alignment/entrainment in the AVP, which has been disabled now, since the service is offline. If you're interested in this, contact Jelte.
By default, the ARIA, NLU, InteractionManagement, a moves xml, FMLManagement, SocialTemplates and InterruptionHandling are included in Flipper.
- ARIA includes templates for capturing SSI input
- NLU is for basic NLU processing
- InteractionManagement takes care of turn management based on voice activity of the user and agent
- A moves XML is custom-written dialogue
- FMLManagement is for translating the moves to FML for GRETA
- SocialTemplates are for measuring a simplified version of the social relationship between user and agent.
- InterruptionHandling is for dealing with interruptions of each interlocutor, based on voice activity
The ARIA-Framework can save all input audio and video, as well as all SSI output. You can use the NOVA visualisation and annotation tool to view all these information streams to help analyse the system, debug it, and potentially create new annotations to retrain particular modules.
To use this, you must first set-up the ARIA-Framework to record all system information. You do this by [ALEX: explain].
Next, you must download and install NOVA, which is developed by ARIA-VALUSPA but as a separate tool. To load an ARIA-Interaction, go the ARIA\System\Agent-Input\ssi\pipes\all-in-one\log folder and select a session (each session is stored under a unique timestamp, e.g. 2016-12-05_12-45-06). Within the folder you will find a file 'project.nova'. Simply drag'n drop this file to NOVA.
A dialogue manager is responsible for structuring dialogues between spoken dialogue systems and people. It's often considered the brain of the system. The Dialogue Management (DM) component of the ARIA project is an information-state based approach to dialogue management. Simply said, it listens to user input, updates internal states and outputs behaviours. It runs on the information-state engine called Flipper (Flipper). Flipper uses behavioural templates, examples can be seen in the example folder or in the /templates folder.
Whenever we get messages over middleware, we want the DM to be aware of this and update its information-state with the information sent over ActiveMQ. This can be user input (e.g. speech), or feedback from the agent (e.g. if it is talking). All information we get (in either JSON or XML) is put into the information state via the class InputManager.java
and its template, ARIA.xml
. To find out what information is available, look at the ARIA.xml
file, where the entire initial information-state is defined.
Currently, we directly copy the input to the information state, such that it has exactly the same name as displayed by input senders (in this case, SSI). To access information in dialogue templates, you can for example use a notation such as is.states.amq.user.arousal.long
to retrieve the long-term arousal level of the user. A full definition of all fields can be found in ARIA.xml
.
In NLU.xml
we define templates that create agent moves and their relevance score. The `NLUManager.java' contains code for updating scores of agent moves, based on the user utterance and other parameters (such as level of interest or trust).
In Moves_Eval_Quest1.xml
are templates that represent the agent moves. These are selected on the rules in their precondition and optionally with the relevance value returned by the QAMatcher via the NLUManager. Once the score is high enough, the move is added as an FML-template for the agent.
In FMLManagement.xml
a threshold is checked for the moves (every move has a score) together with the turn to make the agent execute behaviour. This is then sent over the middleware as defined in FMLManager.java' and
FMLGenerator.java'.
In SocialTemplates.xml
, there is a simple dynamic updating of the trust level of the agent towards the user.
In InteractionManager.xml
, the interaction state (IDLE, ENGAGED, ENGAGING and DISENGAGING) is computed based on voice and face activity. For a change in the state, a message is sent over ActiveMQ to GRETA to start displaying the appropriate behaviour. It also contains templates for floor management, but these should not be used, as there are still some bugs in them.
For now the development is still ongoing and the deployment is not straightforward. Currently we run the system via an IDE (tested in Netbeans and IntelliJ). You can pull the entire repository and open the folder that contains 'ARIA-Core-Final', this is a Maven project. You have to manually add flipper2.0 to the library of the project (you can find it in /lib). Once done, you can run the Main.java
.
The information-state engine we use has three major components:
- An Information State
- Javascript functions
- (Behavioural) templates
At first you have to sketch what type of information you want to use in your dialogue management. You can take a look at the ARIA.xml
file for an example of a defined information state. This information state is written in JSON notation. This can contain information you gather from user input, certain settings of the agent of fields to keep track of the dialogue.
Secondly, there is the functionality to use JavaScript functionality. You even have the possibility to use existing JavaScript libraries, if you define this in your .properties
file (in this project, it's located in the resources folder and is named aria.properties
, this defines the name of your dialogue engine, the JavaScript libraries you want to use, the templates, the refresh rate. There are some other options too, but these are still in development. JavaScript functions should be written between the <javascript></javascript>
tags.
The third component, the templates, is the most crucial one. Here you will write behavioural rules. You can view these templates as 'if preconditions hold, perform these effects'. On the top-level, these are if-then rules, but can contain more complex comparisons. Usually these are written for example as this:
<preconditions>
<condition><![CDATA[is.states.amq.user.emotion.happiness >= 0.8 ]]></condition>
<condition>lastSentenceRepeat</condition>
<preconditions>
Here we check if the user's happiness exceeds the value of 0.8. The other condition calls a (imagenary) JavaScript function that checks if the user's last utterance is equal to the previous one and returns true
if this is yes. The CDATA is used to escape XML tags for JavaScript evaluation. It's also possible to make calls to Java classes in preconditions, these could be written as:
<preconditions>
<method name="isNewSpeech" is="is.states.amq.user.newSpeech>
<object persistent="inputManager" class="eu.aria.dm.managers.InputManager">
</object>
<constructor/>
<arguments>
<value is_type="JSONString" is="is.states.amq.user.speech"/>
<value is_type="JSONString" is="is.states.user.history.speech"/>
</arguments>
</method>
</preconditions>
Here we call a method isNewSpeech
in a Java class InputManager
, which is a persistent object, meaning that we want to create one instance and use this instance every time, instead of creating a new one. The result of the method is put into the information state in the field is.states.amq.user.newSpeech
. We could give arguments to the constructor of the class as well. We can give some arguments to this method, namely we're giving the current speech and the history of utterances captured. In the method isNewSpeech
we check if the speech is actually new.
Of course, once you write preconditions, you also have to write effects in these preconditions. These can be written for example as this:
<effects>
<assign is="is.states.user.voiceActivity>addToVoiceActivity(is.states.amq.user.active)</assign>
</effects>
You can also add method
as an effect. In addition, to execute behaviours, a special case of methods is used, namely function
. The main difference is that functions do not have return types, you write them like below. Functions can only call static Java methods.
<effects>
<function class="eu.aria.dm.util.Feedback" name="printIS">
<arguments>
<value is_type="JSONString" is="is.states.user.history.voice"/>
</arguments>
</function>
This particular function prints the information state as argument. In the Dialogue Manager, there is the generation of behaviour. These are specific instances of methods, such as this one:
<effects>
<behaviour name="executeTemplate">
<object class="eu.aria.dm.behaviours.FMLGenerator" persistent="FMLGenerator">
</object>
<arguments>
<value class="String" is="is.states.agent.fml" is_type="JSONString"/>
</arguments>
</behaviour>
</effects>
This particular behaviour calls a method that sends FML to the behaviour realizer.
TODO:
- Describe moves
For more information on Flipper and its templates, visit the wiki here.
If you want to write your own templates, it's good to use dialogue acts for structuring the dialogue within Flipper in the preconditions of the agent behaviour. For the ARIA-framework, we take the taxonomy from DIT++, found on https://dit.uvt.nl. The framework helps you to determine which dialogue acts to recognize and how to respond to them.
The following files are created during a session:
- audio.wav: A PCM audio stream with a sample rate of 16kHz
- video.avi: A AVI video file with a frame rate of 25 fps
- voiceactivity.stream: A SSI stream file with the calculated voice activity (1 dimension, 33.3 Hz)
- arousal.stream: A SSI stream file with the estimated arousal level (1 dimension, 25 Hz)
- valence.stream: A SSI stream file with the estimated valence level (1 dimension, 25 Hz)
- interest.stream: A SSI stream file with the estimated interest level (1 dimension, 25 Hz)
- emax.stream: A SSI stream file with the output of emax (13 dimension, 25 Hz)
- transcription.annotation: A transcription with the results of the ASR
- agent-fml.annotation: A log file with the FML messages sent to the agents
- agent-bml.annotation: A log file with the BML messages sent to the agents
- dialog.annnotation: A log file storing the dialog acts generated by the DM
A SSI stream is stored in two files and the one ending on ~ contains the stream data in ASCII format. Each line stores a sample and sample values are separated by blank. Likewise, in the annotation ~ file each line stores a segment defined by a start and end point (in seconds) and a label name.