Skip to content

Latest commit

 

History

History
48 lines (26 loc) · 4.09 KB

README.md

File metadata and controls

48 lines (26 loc) · 4.09 KB

Virtual Rapport with Furhat

Research has demonstrated that creating rapport during social interactions has positive effects in engagement and connection. In situations when a human interacts with a social robot, one way of increasing rapport is through the synthesis of nonverbal backchannels from the robot at appropriate times. In this project, we build a prototype system that equips social robots with rapport capabilities. We use Furhat, a social robot with human-like expressions and conversational capabilities. We connect Furhat with the multimodal learning platform OpenSense through the creation of two new components:

  1. Virtual Rapport System is a component that takes in the inputs from OpenSense (Head Gesture, Acoustic Features and Voice Activity Detection) and maps them to Furhat Gestures following a set of rules by Gratch et al This component is meant to be a proof of concept to establish the connection between Furhat and Opensense and the behavior can be expanded and built upon in the future.

  2. Furhat Controller is a component that serves as the interface to Furhat. It takes in the gesture commands produced by the Virtual Rapport System and issues calls to the Furhat Web API for the gestures to be synthesized by the robot. This controller is decoupled as an independent component so it can be used with other OpenSense components that interact with Furhat.

These are the rules implemented by the Virtual Rapport System:

  1. Lowering of pitch -> head Nod

Used OpenSMILE component to extract feature F0_sma_linregc1 (feature 462). The config file used is emobase_live4.config  This is the slope of F0. If a pitch within human speech is detected and the slope is lower than a threshold it means the speaker has lowered their pitch, and the robot nods.

  1. Raised loudness -> head Nod

Similarly, we used the OpenSMILE component to extract feature pcm_loudness_sma_linregc1 (feature 25) to extract the slope of loudness. If it’s higher than a threshold then the robot nods.

  1. Speech disfluency -> Gaze shift

We use the Voice Activity Detector component with a pause length of 2 seconds, meaning that if the speaker is quiet for more than 3 seconds the robot will shift the gaze to signal to the speaker that they can take their time. 

  1. Head nod/head shake -> mimic 

We use the Head Gesture Detector with OpenFace to detect and mimic Head Nods and Shakes. 

There is a minimum 5 second spacing in between gestures so that the backchannels from Furhat do not become overwhelming or repetitive.

Steps to run

  • Power on Furhat robot and connect it to the network. Get its IP address following the instructions.

  • Go to the Furhat Studio and activate the Web API button to run the API server.

  • Make sure you have a camera available to OpenSense (computer camera or external webcam).

  • Open OpenSense environment and check that the IP address of your Furhat robot matches that variable furhatUri in FurhatController.cs.

  • Run OpenSense as you would normally. The custom pipeline sample is recommended. Make sure your OpenFace parameters are correctly set according to your camera parameters.

Demo

Access a demonstration video here: https://drive.google.com/drive/folders/1eATIo7hI24wXU18HQSh_lZaBpcScT31d?usp=sharing

To improve/ Future work

  • Allow Furhat’s URI to be configured as a parameter in the visual UI instead in the source code.
  • Have a hierarchy of what backchannel should be displayed when multiple gestures are detected on the speaker.
  • Implement the detection of speaker gaze shift and mimic.