This project contains a demonstration on how to use the (preview) Speakup voice APIs in combination with Google text-to-speech, Google speech-to-text and the Rasa conversational AI. The result is a small voice driven application that can perform actions based on a simple (spoken) dialogue with the caller.
One of the possible applications would be as an alternative for the classic IVR menu (press 1 for ...). Instead the caller can explain his or her intent for the bot to understand and the bot can even answer certain queries without human intervention.
This is by no means intended as a completed product, instead it is a demonstration of what ca be accomplished by combining a few simple API endpoints into useful functionality. Feel free to expand on this, propose changes, use it for inspiration or simply to try it out!
To build and run the service locally are you will need the following:
- JDK 14 or higher. For example OpenJDK.
- A Linux machine running a recent version of Docker. The Docker API should be configured so that
it is accessible locally on
/var/run/docker.sock
(the default on most systems). - You will need to have Docker Compose installed and available
in your
PATH
. - A Client ID and Client Secret for access to the Speakup programmable voice preview API. If you don't have these and feel that you should: please contact your friendly neighbourhood developer at Speakup.
- An external phone number to dial to access your bot. This number belongs to your account and has been supplied by Speakup as well.
- A credentials file for the Google text-to-speech and speech-to-text API.
- A way to make your service accessible from the Speakup infrastructure over HTTP, we use ngrok to accomplish this.
- A basic understanding of Spring Boot and Project Reactor will be of great help!
Use the following steps to build the voicebot demo from source and run it locally.
Build the project by running the following command from the root:
./gradlew build
This will build the voicebot service from source, create a Docker image for it and build the NLU component (Rasa) using Docker.
Now that you have built the NLU component you can start it on your local Docker daemon. To make this
convenient a docker-compose
file is included in the project. Start the component by running the
following from the root of the project:
docker-compose up -d
You will now have a Rasa service running for which the API is exposed on: http://localhost:5005/.
Set up an ngrok tunnel to access your service running locally. We're assuming
that you keep the default configuration for now and will run it on port 8080
later:
ngrok http 8080
Now you will see output along these lines:
ngrok by @inconshreveable (Ctrl+C to quit)
Session Status online
Session Expires 7 hours, 59 minutes
Version 2.3.35
Region United States (us)
Web Interface http://127.0.0.1:4040
Forwarding http://0c5faf894640.ngrok.io -> http://localhost:8080
Forwarding https://0c5faf894640.ngrok.io -> http://localhost:8080
Keep the HTTPS URL from the last line handy, we will be using it later. Make sure you have your OAuth2 Client ID and Client secret as well.
Before we move on, we need to set up our Google credentials if you haven't already. Credentials
for the Google APIs can be created here.
You will need a service account that has access to the text-to-speech and speech-to-text APIs. Use
the JSON
key type, which will result in a file that you store locally.
Now let's start the service by running:
GOOGLE_APPLICATION_CREDENTIALS=<path to your Google credentials file> \
java -jar service/build/libs/service-0.1.0-SNAPSHOT.jar \
--speakup.voice.endpoint.external-uri=<your ngrok HTTPS URI goes here ...> \
--speakup.voice.api.application-id=<your OAuth2 Client ID goes here ...> \
--speakup.voice.api.application-secret=<your OAuth2 Client Secret goes here ...>
At this point, if everything went well you should be able to access your bot through the phone number that corresponds with your API account.
Now that we have the nitty gritty details out of the way you can experiment away by modifying the NLU training data or tinker with the logic of the voice bot. See the following sections on how to get started!
The Voicebot project is structured into two subprojects:
service
: contains the voice bot itself, a Spring Boot service.nlu
: contains a Docker build to set up a very basic Rasa NLU model.
The voice bot service is a small and rather straight forward Spring Boot service. It has a few notable components:
nl.speakup.voice.voicebotdemo.Bot
: the bot itself, contains the logic for handling a call listening for utterances, you name it. This is where you apply your creativity!nl.speakup.voice.voicebotdemo.NluService
: a small service client to access the Rasa NLU service using a Spring WebClient.nl.speakup.voice.voicebotdemo.SpeechService
: a small service to that provides convenient accessors the Google Speech APIs.nl.speakup.voice.voicebotdemo.VoiceApiClient
: a client component to access the Speakup programmable voice API. It listens for voice events over a websocket connection and allows you to issue commands. Note that the client is incomplete: it doesn't handle all events, nor does it expose the full extent of commands that can be issued. However the service is easy to extend should you need more functionality.
As with any Spring Boot service there are several
mechanisms
to present configuration settings. See src/main/resources/application.yaml
for the defaults.
This project comes with a very basic configuration for Rasa that gets you started with a Dutch NLU model that has a few intents with some training data.
The nlu
subproject contains a Docker build procedure (nlu/src/main/docker/Dockerfile
) that
sets up a basic Rasa instance and trains the model. The Rasa configuration files as well as the
training data can be found under nlu/src/main/rasa
.
Explaining how to configure a Rasa model is well beyond the scope of this READMe. Furthermore, the Rasa project has excellent documentation to help you. This project only uses a fraction of the potential of Rasa.
The Docker build for Rasa is performed from the main Gradle build. Therefore building and training
Rasa is as easy as running ./gradlew build
or ./gradlew :nlu:build
if you want to be more
specific.
The included Docker Compose file starts it locally:
docker-compose up -d
Now for example you can access the NLU component as follows:
curl -s --header "Content-Type: application/json" --request POST --data '{"text":"Hello!"}' http://localhost:5005/model/parse | jq