index.html

<!doctype html>
<html lang="en">
<head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=1024" />
    <meta name="apple-mobile-web-app-capable" content="yes" />
    <title>Beach Wreck Ignition: Challenges in open source voice</title>

    <meta name="description" content="Presentation to linux.conf.au 2019, Christchurch, 24 January 2019" />
    <meta name="author" content="Kathy Reid <kathy@kathyreid.id.au>" />

    <link href="css/beachwreckignition.css" rel="stylesheet" />

    <link rel="shortcut icon" href="favicon.png" />
</head>

<body class="impress-not-supported">

<div class="fallback-message">
    <p>Your browser <b>doesn't support the features required</b> by impress.js, so you are presented with a simplified version of this presentation.</p>
    <p>For the best experience please use the latest <b>Chrome</b>, <b>Safari</b> or <b>Firefox</b> browser.</p>
</div>

<div id="logo">
  <!-- statically positioned logo -->
</div>

<div id="backgroundlogo">
  <!-- statically positioned logo that the overview centres on -->
</div>

<div id="impress">

  <div id="title" class="step" data-x="0" data-y="0">
  </div>

  <div id="intro" class="step" data-x="1000" data-y="0">
    <div class="step-container container-width-50">
      <h1>Beach Wreck Ignition:</h1>
      <h2>Challenges in open source voice</h2>
      <h3>Kathy Reid <span class="highlight mixedCase">@KathyReid</span></h3>
      <h3><em>(formerly) Director of Developer Relations, <span class="highlight mixedCase">@Mycroft_AI</span></em></h3>
      <p class="attribution"><strong>Attribution</strong>: <a href="https://flic.kr/p/8ykkny">Vanity 365 Day 55</em> via Rocky Sun on Flickr.</a></p>
    </div>
  </div>

  <div id="kitt" class="step" data-x="3000" data-y="0">
    <p class="attribution"><strong>Attribution</strong>: <a href="https://flic.kr/p/24kaPXy">Kitt aus Knight Rider</em> via Marco Verch on Flickr.</a></p>
  </div>

  <div id="lcars" class="step" data-x="4000" data-y="0">
    <p class="attribution"><strong>Attribution</strong>: <a href="https://commons.wikimedia.org/wiki/File:Lcars_wallpaper.gif">LCARS desktop</em> via Morn on Wikimedia Commons.</a></p>
  </div>

  <div id="timetrax" class="step" data-x="5000" data-y="0">
  </div>


  <div id="voicestack" class="step" data-x="6000" data-y="0">
    <div class="step-container">
      <h1>Introduction to the general voice stack</h1>
    </div>
  </div>

  <div id="anatomy" class="step" data-x="7000" data-y="0">
  </div>

  <div id="outline" class="step" data-x="2000" data-y="0">
    <div class="step-container">
      <h1>Overview</h1>
      <ul class="bulletedList">
        <li><strong>Voice Stack</strong> -  components that make up a voice stack</li>
        <li><strong>Wake Word</strong> - detection that the user wants to issue a command</li>
        <li><strong>Speech to Text</strong> - transcribing voice sounds into written form</li>
        <li><strong>Intent matching</strong> - matching utterances to a command</li>
        <li><strong>Skills</strong>  - executing commands</li>
        <li><strong>Text to Speech</strong>  - turning written text into voice sounds</li>
        <li><strong>Multilingual considerations</strong>  - how do you handle this for multiple languages?</li>
      </ul>
    </div>
  </div>

  <div id="wakeword" class="step" data-x="8000" data-y="0">
    <div class="step-container">
      <h1>Wake Word</h1>
      <ul class="bulletedList">
        <li><strong>PocketSphinx</strong> -  https://github.com/cmusphinx/pocketsphinx</li>
        <li><strong>Snowboy</strong> - https://github.com/Kitt-AI/snowboy</li>
        <li><strong>Mycroft AI Precise</strong> - https://github.com/MycroftAI/mycroft-precise</li>
      </ul>
    </div>
  </div>

  <div id="phonemes" class="step" data-x="9000" data-y="0">
    <div class="step-container">
      <h1>Phonemes</h1>
      <p class="quote">"The smallest unit of sound that distinguishes one word from another in a particular language.
        <br>Different languages have different phonemes."
      </p>
    </div>
  </div>

  <div id="phoneme-chart" class="step" data-x="10000" data-y="0">
    <p class="attribution"><strong>Attribution</strong>: EnglishClub.com</em></p>
  </div>

  <div id="similar-phonemes" class="step" data-x="11000" data-y="0">
    <div class="step-container">
      <h1>Similar-sounding phonemes</h1>
      <ul class="bulletedList">
        <li><strong>"p* / b*" sounds</strong> -  try saying <span class="highlight mixedCase">bizza</span> instead of <span class="highlight mixedCase">pizza</span>. </li>
        <li><strong>"s* / z*" sounds</strong> -  try saying <span class="highlight mixedCase">soo</span> instead of <span class="highlight mixedCase">zoo</span></li>
        <li><strong>"k* / g*" sounds</strong> -  try saying <span class="highlight mixedCase">gate</span> instead of <span class="highlight mixedCase">Kate</span></li>
      </ul>
    </div>
  </div>

  <div id="wakeword-challenges" class="step" data-x="12000" data-y="0">
    <div class="step-container">
      <h1>Wake Word - Challenges</h1>
      <ul class="bulletedList">
        <li><strong>Always listening</strong> - Wake Word listeners are "always on" </li>
        <li><strong>Accuracy</strong> - False negatives and false positives</li>
      </ul>
    </div>
  </div>

  <div id="haber" class="step" data-x="14000" data-y="0">
    <div class="step-container">
      <h1>Haber's <br>Classification <br>of Contexts</h1>
      <p class="attribution"><strong>Attribution</strong>:Haber, J., Greening, M., Castellano, L., & Wheaton, P. (n.d.). Proxemic Conversational UI: Moving beyond simple conversation.</p>
    </div>
  </div>

  <div id="wakeword-hat" class="step" data-x="13000" data-y="0">
    <p class="attribution"><strong>Attribution</strong>: Project Alias</em> via Project Alias</p>
  </div>


  <div id="wakeword-accuracy" class="step" data-x="15000" data-y="0">
    <div class="step-container">
      <h1>Wake Word - Accuracy</h1>
      <p class="attribution"><strong>Attribution</strong>: <a href="https://flic.kr/p/aJaboR">bullseye</em> via Emilio Kuffer on Flickr.</a></p>
    </div>
  </div>

  <div id="wakeword-false" class="step" data-x="16000" data-y="0">
    <div class="step-container">
      <h1>Wake Word - measuring accuracy</h1>
      <ul class="bulletedList">
        <li><strong>False positive</strong> - <span class="highlight mixedCase">failure</span> - Wake Word detected when it wasn't spoken</li>
        <li><strong>True positive</strong> - <span class="highlight mixedCase">success</span> - Wake Word correctly detected when it was spoken</li>
        <li><strong>True negative</strong> - <span class="highlight mixedCase">success</span> - Wake Word not detected when it wasn't spoken</li>
        <li><strong>False negative</strong> - <span class="highlight mixedCase">failure</span> - Wake Word spoken but not detected</li>
      </ul>
    </div>
  </div>

  <div id="wakeword-privacy" class="step" data-x="17000" data-y="0">
  </div>

  <div id="stt" class="step" data-x="17500" data-y="0">
    <div class="step-container">
      <h1>Speech to Text</h1>
      <ul class="bulletedList">
        <li><strong>Kaldi</strong> -  https://github.com/kaldi-asr/kaldi</li>
        <li><strong>Mozilla DeepSpeech</strong> - https://github.com/mozilla/DeepSpeech</li>
        <li><strong>Mozilla Common Voice</strong> - https://voice.mozilla.org/en</li>
      </ul>
    </div>
  </div>

  <div id="stt-commonvoice" class="step" data-x="18000" data-y="0">
  </div>

  <div id="stt-challenges" class="step" data-x="19000" data-y="0">
    <div class="step-container">
      <h1>STT - Challenges</h1>
      <ul class="bulletedList">
        <li><strong>Training a model</strong> - Amount of data and training required</li>
        <li><strong>Accuracy</strong> - Accuracy has an impact on voice user experience</li>
      </ul>
    </div>
  </div>

  <div id="stt-accents" class="step" data-x="20000" data-y="0">
  </div>

  <div id="bingle" class="step" data-x="21000" data-y="0">
    <div class="step-container">
      <h1>Consider the phrase</h1>
      <p class="quote">"Yeah nah mate, there's been a bingle in Broady, and the Western's chokkas back to the servo, I'm gonna be late for bevvies at Tommo's."
      </p>
    </div>
  </div>

  <div id="bingle-translated" class="step" data-x="22000" data-y="0">
    <div class="step-container">
      <h1>Translation for non-Australians ;-) </h1>
      <pre><code>
Greetings, friend
There's been a car accident in Broadmeadows
and the Western Freeway is congested
back to the service station
and as a result I will be late
to the social function at Mr Thompson's.
      <pre></code>
    </div>
  </div>

  <div id="languages" class="step" data-x="23000" data-y="0">
  </div>

  <div id="language-challenges" class="step" data-x="24000" data-y="0">
    <div class="step-container">
      <h1>Mycroft Translate - Challenges</h1>
      <ul class="bulletedList">
        <li><strong>Line by line translation</strong> - Does not allow for context</li>
        <li><strong>Gender</strong> - Different languages handle gender differently</li>
        <li><strong>Hierarchy</strong> - Different language for different formality</li>
      </ul>
    </div>
  </div>

  <div id="kia-ora-mate" class="step" data-x="25000" data-y="0">
    <div class="step-container">
      <p class="attribution"><strong>Attribution</strong>: <a href="https://twitter.com/waikatoreo/status/1051264259089264640">kia ora mate</em> via @waikatoreo on Twitter.</a></p>
    </div>
  </div>

  <div id="intent-parsers" class="step" data-x="26000" data-y="0">
    <div class="step-container">
      <h1>Intent Parsers</h1>
      <ul class="bulletedList">
        <li><strong>Rasa</strong> -  https://rasa.com/docs/nlu/</li>
        <li><strong>Mycroft Adapt</strong> - https://github.com/MycroftAI/adapt</li>
        <li><strong>Mycroft Padatious</strong> - https://github.com/MycroftAI/padatious</li>
      </ul>
    </div>
  </div>

  <div id="intent-challenges" class="step" data-x="27000" data-y="0">
    <div class="step-container">
      <h1>Intent Parser challenges</h1>
      <ul class="bulletedList">
        <li><strong>Intent collisions</strong> - Diambiguating intents so that the "most likely" command is invoked for the user</li>
      </ul>
    </div>
  </div>

  <div id="common-play-framework" class="step" data-x="28000" data-y="0">
    <div class="step-container">
      <h1>Common Play Framework</h1>

        <p><code><span class="highlight">CPSMatchLevel.EXACT</span></code> (The input matches exact)</p>
        <p><code><span class="highlight">CPSMatchLevel.MULTI_KEY</span></code> (The input contains multiple matches such as Artist and Album title)</p>
        <p><code><span class="highlight">CPSMatchLevel.TITLE</span></code> (The phrase contains a matching title)</p>
        <p><code><span class="highlight">CPSMatchLevel.ARTIST</span></code> (The phrase contains a matching artist)</p>
        <p><code><span class="highlight">CPSMatchLevel.CATEGORY</span></code> (The phrase contains a category supported by the skill, Rock, bitpop, Podcast etc.)</p>
        <p><code><span class="highlight">CPSMatchLevel.GENERIC</span></code> (Generic match, maybe contains the skill title but no media match)</p>
        <br>
        <p>where <code><span class="highlight">CPSMatchLevel.EXACT</span></code> is the greatest confidence and the <code><span class="highlight">CPSMatchLevel.GENERIC</span></code> is lowest.</p>

    </div>
  </div>

  <div id="text-to-speech" class="step" data-x="29000" data-y="0">
    <div class="step-container">
      <h1>Text to Speech</h1>
      <ul class="bulletedList">
        <li><strong>Mary TTS</strong> <br> -  http://mary.dfki.de/</li>
        <li><strong>Espeak</strong> <br> - http://espeak.sourceforge.net/</li>
        <li><strong>Mycroft Mimic</strong> <br> - https://mycroft.ai/documentation/mimic/</li>
        <li><strong>Mycroft Mimic 2</strong> <br>- https://github.com/MycroftAI/mimic2</li>
      </ul>
    </div>
  </div>

  <div id="text-to-speech-challenges" class="step" data-x="30000" data-y="0">
    <div class="step-container">
      <h1>Text to Speech Challenges</h1>
      <ul class="bulletedList">
        <li><strong>Natural sounding voice</strong> - making the voice sound not robotic</li>
        <li><strong>Pronunciation</strong> - often requires correction</li>
      </ul>
    </div>
  </div>

  <div id="malala" class="step" data-x="33000" data-y="0">
    <div class="step-container">
      <h1>A parting quote</h1>
      <p class="quote">"When the whole world is silent, even one voice becomes powerful."
        <br> <br> - MALALA YOUSAFZAI
      </p>
    </div>
  </div>

  <div id="thankyou" class="step" data-x="31000" data-y="0">
    <div class="step-container">
      <h1>Thank you :-)</h1>
      <p>Questions warmly welcomed</p>
    </div>
  </div>


</div>

<!--
    This is a UI plugin. You can read more about plugins in src/plugins/README.md.
    For now, I'll just tell you that this adds some graphical controls to navigate the
    presentation. In the CSS file you can style them as you want. We've put them bottom right.
-->
<div id="impress-toolbar"></div>

<!--

    Hint is not related to impress.js in any way.

    But it can show you how to use impress.js features in creative way.

    When the presentation step is shown (selected) its element gets the class of "active" and the body element
    gets the class based on active step id `impress-on-ID` (where ID is the step's id)... It may not be
    so clear because of all these "ids" in previous sentence, so for example when the first step (the one with
    the id of `bored`) is active, body element gets a class of `impress-on-bored`.

    This class is used by this hint below. Check CSS file to see how it's shown with delayed CSS animation when
    the first step of presentation is visible for a couple of seconds.

    ...

    And when it comes to this piece of JavaScript below ... kids, don't do this at home ;)
    It's just a quick and dirty workaround to get different hint text for touch devices.
    In a real world it should be at least placed in separate JS file ... and the touch content should be
    probably just hidden somewhere in HTML - not hard-coded in the script.

    Just sayin' ;)

-->
<div class="hint">
    <p>Use a spacebar or arrow keys to navigate. <br/>
       Press 'P' to launch speaker console.</p>
</div>
<script>
if ("ontouchstart" in document.documentElement) {
    document.querySelector(".hint").innerHTML = "<p>Swipe left or right to navigate</p>";
}
</script>

<!--

    Last, but not least.

    To make all described above really work, you need to include impress.js in the page.
    I strongly encourage to minify it first.

    In here I just include full source of the script to make it more readable.

    You also need to call a `impress().init()` function to initialize impress.js presentation.
    And you should do it in the end of your document. Not only because it's a good practice, but also
    because it should be done when the whole document is ready.
    Of course you can wrap it in any kind of "DOM ready" event, but I was too lazy to do so ;)

-->
<script src="js/impress.js"></script>
<script>impress().init();</script>

<!--

    The `impress()` function also gives you access to the API that controls the presentation.

    Just store the result of the call:

        var api = impress();

    and you will get three functions you can call:

        `api.init()` - initializes the presentation,
        `api.next()` - moves to next step of the presentation,
        `api.prev()` - moves to previous step of the presentation,
        `api.goto( stepIndex | stepElementId | stepElement, [duration] )` - moves the presentation to the step given by its index number
                id or the DOM element; second parameter can be used to define duration of the transition in ms,
                but it's optional - if not provided default transition duration for the presentation will be used.

    You can also simply call `impress()` again to get the API, so `impress().next()` is also allowed.
    Don't worry, it wont initialize the presentation again.

    For some example uses of this API check the last part of the source of impress.js where the API
    is used in event handlers.

-->

</body>
</html>

<!--

    Now you know more or less everything you need to build your first impress.js presentation, but before
    you start...

    Oh, you've already cloned the code from GitHub?

    You have it open in text editor?

    Stop right there!

    That's not how you create awesome presentations. This is only a code. Implementation of the idea that
    first needs to grow in your mind.

    So if you want to build great presentation take a pencil and piece of paper. And turn off the computer.

    Sketch, draw and write. Brainstorm your ideas on a paper. Try to build a mind-map of what you'd like
    to present. It will get you closer and closer to the layout you'll build later with impress.js.

    Get back to the code only when you have your presentation ready on a paper. It doesn't make sense to do
    it earlier, because you'll only waste your time fighting with positioning of useless points.

    If you think I'm crazy, please put your hands on a book called "Presentation Zen". It's all about
    creating awesome and engaging presentations.

    Think about it. 'Cause impress.js may not help you, if you have nothing interesting to say.

-->

<!--

    Are you still reading this?

    For real?

    I'm impressed! Feel free to let me know that you got that far (I'm @bartaz on Twitter), 'cause I'd like
    to congratulate you personally :)

    But you don't have to do it now. Take my advice and take some time off. Make yourself a cup of coffee, tea,
    or anything you like to drink. And raise a glass for me ;)

    Cheers!

-->