-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
441 lines (345 loc) · 17.6 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=1024" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<title>Beach Wreck Ignition: Challenges in open source voice</title>
<meta name="description" content="Presentation to linux.conf.au 2019, Christchurch, 24 January 2019" />
<meta name="author" content="Kathy Reid <[email protected]>" />
<link href="css/beachwreckignition.css" rel="stylesheet" />
<link rel="shortcut icon" href="favicon.png" />
</head>
<body class="impress-not-supported">
<div class="fallback-message">
<p>Your browser <b>doesn't support the features required</b> by impress.js, so you are presented with a simplified version of this presentation.</p>
<p>For the best experience please use the latest <b>Chrome</b>, <b>Safari</b> or <b>Firefox</b> browser.</p>
</div>
<div id="logo">
<!-- statically positioned logo -->
</div>
<div id="backgroundlogo">
<!-- statically positioned logo that the overview centres on -->
</div>
<div id="impress">
<div id="title" class="step" data-x="0" data-y="0">
</div>
<div id="intro" class="step" data-x="1000" data-y="0">
<div class="step-container container-width-50">
<h1>Beach Wreck Ignition:</h1>
<h2>Challenges in open source voice</h2>
<h3>Kathy Reid <span class="highlight mixedCase">@KathyReid</span></h3>
<h3><em>(formerly) Director of Developer Relations, <span class="highlight mixedCase">@Mycroft_AI</span></em></h3>
<p class="attribution"><strong>Attribution</strong>: <a href="https://flic.kr/p/8ykkny">Vanity 365 Day 55</em> via Rocky Sun on Flickr.</a></p>
</div>
</div>
<div id="kitt" class="step" data-x="3000" data-y="0">
<p class="attribution"><strong>Attribution</strong>: <a href="https://flic.kr/p/24kaPXy">Kitt aus Knight Rider</em> via Marco Verch on Flickr.</a></p>
</div>
<div id="lcars" class="step" data-x="4000" data-y="0">
<p class="attribution"><strong>Attribution</strong>: <a href="https://commons.wikimedia.org/wiki/File:Lcars_wallpaper.gif">LCARS desktop</em> via Morn on Wikimedia Commons.</a></p>
</div>
<div id="timetrax" class="step" data-x="5000" data-y="0">
</div>
<div id="voicestack" class="step" data-x="6000" data-y="0">
<div class="step-container">
<h1>Introduction to the general voice stack</h1>
</div>
</div>
<div id="anatomy" class="step" data-x="7000" data-y="0">
</div>
<div id="outline" class="step" data-x="2000" data-y="0">
<div class="step-container">
<h1>Overview</h1>
<ul class="bulletedList">
<li><strong>Voice Stack</strong> - components that make up a voice stack</li>
<li><strong>Wake Word</strong> - detection that the user wants to issue a command</li>
<li><strong>Speech to Text</strong> - transcribing voice sounds into written form</li>
<li><strong>Intent matching</strong> - matching utterances to a command</li>
<li><strong>Skills</strong> - executing commands</li>
<li><strong>Text to Speech</strong> - turning written text into voice sounds</li>
<li><strong>Multilingual considerations</strong> - how do you handle this for multiple languages?</li>
</ul>
</div>
</div>
<div id="wakeword" class="step" data-x="8000" data-y="0">
<div class="step-container">
<h1>Wake Word</h1>
<ul class="bulletedList">
<li><strong>PocketSphinx</strong> - https://github.com/cmusphinx/pocketsphinx</li>
<li><strong>Snowboy</strong> - https://github.com/Kitt-AI/snowboy</li>
<li><strong>Mycroft AI Precise</strong> - https://github.com/MycroftAI/mycroft-precise</li>
</ul>
</div>
</div>
<div id="phonemes" class="step" data-x="9000" data-y="0">
<div class="step-container">
<h1>Phonemes</h1>
<p class="quote">"The smallest unit of sound that distinguishes one word from another in a particular language.
<br>Different languages have different phonemes."
</p>
</div>
</div>
<div id="phoneme-chart" class="step" data-x="10000" data-y="0">
<p class="attribution"><strong>Attribution</strong>: EnglishClub.com</em></p>
</div>
<div id="similar-phonemes" class="step" data-x="11000" data-y="0">
<div class="step-container">
<h1>Similar-sounding phonemes</h1>
<ul class="bulletedList">
<li><strong>"p* / b*" sounds</strong> - try saying <span class="highlight mixedCase">bizza</span> instead of <span class="highlight mixedCase">pizza</span>. </li>
<li><strong>"s* / z*" sounds</strong> - try saying <span class="highlight mixedCase">soo</span> instead of <span class="highlight mixedCase">zoo</span></li>
<li><strong>"k* / g*" sounds</strong> - try saying <span class="highlight mixedCase">gate</span> instead of <span class="highlight mixedCase">Kate</span></li>
</ul>
</div>
</div>
<div id="wakeword-challenges" class="step" data-x="12000" data-y="0">
<div class="step-container">
<h1>Wake Word - Challenges</h1>
<ul class="bulletedList">
<li><strong>Always listening</strong> - Wake Word listeners are "always on" </li>
<li><strong>Accuracy</strong> - False negatives and false positives</li>
</ul>
</div>
</div>
<div id="haber" class="step" data-x="14000" data-y="0">
<div class="step-container">
<h1>Haber's <br>Classification <br>of Contexts</h1>
<p class="attribution"><strong>Attribution</strong>:Haber, J., Greening, M., Castellano, L., & Wheaton, P. (n.d.). Proxemic Conversational UI: Moving beyond simple conversation.</p>
</div>
</div>
<div id="wakeword-hat" class="step" data-x="13000" data-y="0">
<p class="attribution"><strong>Attribution</strong>: Project Alias</em> via Project Alias</p>
</div>
<div id="wakeword-accuracy" class="step" data-x="15000" data-y="0">
<div class="step-container">
<h1>Wake Word - Accuracy</h1>
<p class="attribution"><strong>Attribution</strong>: <a href="https://flic.kr/p/aJaboR">bullseye</em> via Emilio Kuffer on Flickr.</a></p>
</div>
</div>
<div id="wakeword-false" class="step" data-x="16000" data-y="0">
<div class="step-container">
<h1>Wake Word - measuring accuracy</h1>
<ul class="bulletedList">
<li><strong>False positive</strong> - <span class="highlight mixedCase">failure</span> - Wake Word detected when it wasn't spoken</li>
<li><strong>True positive</strong> - <span class="highlight mixedCase">success</span> - Wake Word correctly detected when it was spoken</li>
<li><strong>True negative</strong> - <span class="highlight mixedCase">success</span> - Wake Word not detected when it wasn't spoken</li>
<li><strong>False negative</strong> - <span class="highlight mixedCase">failure</span> - Wake Word spoken but not detected</li>
</ul>
</div>
</div>
<div id="wakeword-privacy" class="step" data-x="17000" data-y="0">
</div>
<div id="stt" class="step" data-x="17500" data-y="0">
<div class="step-container">
<h1>Speech to Text</h1>
<ul class="bulletedList">
<li><strong>Kaldi</strong> - https://github.com/kaldi-asr/kaldi</li>
<li><strong>Mozilla DeepSpeech</strong> - https://github.com/mozilla/DeepSpeech</li>
<li><strong>Mozilla Common Voice</strong> - https://voice.mozilla.org/en</li>
</ul>
</div>
</div>
<div id="stt-commonvoice" class="step" data-x="18000" data-y="0">
</div>
<div id="stt-challenges" class="step" data-x="19000" data-y="0">
<div class="step-container">
<h1>STT - Challenges</h1>
<ul class="bulletedList">
<li><strong>Training a model</strong> - Amount of data and training required</li>
<li><strong>Accuracy</strong> - Accuracy has an impact on voice user experience</li>
</ul>
</div>
</div>
<div id="stt-accents" class="step" data-x="20000" data-y="0">
</div>
<div id="bingle" class="step" data-x="21000" data-y="0">
<div class="step-container">
<h1>Consider the phrase</h1>
<p class="quote">"Yeah nah mate, there's been a bingle in Broady, and the Western's chokkas back to the servo, I'm gonna be late for bevvies at Tommo's."
</p>
</div>
</div>
<div id="bingle-translated" class="step" data-x="22000" data-y="0">
<div class="step-container">
<h1>Translation for non-Australians ;-) </h1>
<pre><code>
Greetings, friend
There's been a car accident in Broadmeadows
and the Western Freeway is congested
back to the service station
and as a result I will be late
to the social function at Mr Thompson's.
<pre></code>
</div>
</div>
<div id="languages" class="step" data-x="23000" data-y="0">
</div>
<div id="language-challenges" class="step" data-x="24000" data-y="0">
<div class="step-container">
<h1>Mycroft Translate - Challenges</h1>
<ul class="bulletedList">
<li><strong>Line by line translation</strong> - Does not allow for context</li>
<li><strong>Gender</strong> - Different languages handle gender differently</li>
<li><strong>Hierarchy</strong> - Different language for different formality</li>
</ul>
</div>
</div>
<div id="kia-ora-mate" class="step" data-x="25000" data-y="0">
<div class="step-container">
<p class="attribution"><strong>Attribution</strong>: <a href="https://twitter.com/waikatoreo/status/1051264259089264640">kia ora mate</em> via @waikatoreo on Twitter.</a></p>
</div>
</div>
<div id="intent-parsers" class="step" data-x="26000" data-y="0">
<div class="step-container">
<h1>Intent Parsers</h1>
<ul class="bulletedList">
<li><strong>Rasa</strong> - https://rasa.com/docs/nlu/</li>
<li><strong>Mycroft Adapt</strong> - https://github.com/MycroftAI/adapt</li>
<li><strong>Mycroft Padatious</strong> - https://github.com/MycroftAI/padatious</li>
</ul>
</div>
</div>
<div id="intent-challenges" class="step" data-x="27000" data-y="0">
<div class="step-container">
<h1>Intent Parser challenges</h1>
<ul class="bulletedList">
<li><strong>Intent collisions</strong> - Diambiguating intents so that the "most likely" command is invoked for the user</li>
</ul>
</div>
</div>
<div id="common-play-framework" class="step" data-x="28000" data-y="0">
<div class="step-container">
<h1>Common Play Framework</h1>
<p><code><span class="highlight">CPSMatchLevel.EXACT</span></code> (The input matches exact)</p>
<p><code><span class="highlight">CPSMatchLevel.MULTI_KEY</span></code> (The input contains multiple matches such as Artist and Album title)</p>
<p><code><span class="highlight">CPSMatchLevel.TITLE</span></code> (The phrase contains a matching title)</p>
<p><code><span class="highlight">CPSMatchLevel.ARTIST</span></code> (The phrase contains a matching artist)</p>
<p><code><span class="highlight">CPSMatchLevel.CATEGORY</span></code> (The phrase contains a category supported by the skill, Rock, bitpop, Podcast etc.)</p>
<p><code><span class="highlight">CPSMatchLevel.GENERIC</span></code> (Generic match, maybe contains the skill title but no media match)</p>
<br>
<p>where <code><span class="highlight">CPSMatchLevel.EXACT</span></code> is the greatest confidence and the <code><span class="highlight">CPSMatchLevel.GENERIC</span></code> is lowest.</p>
</div>
</div>
<div id="text-to-speech" class="step" data-x="29000" data-y="0">
<div class="step-container">
<h1>Text to Speech</h1>
<ul class="bulletedList">
<li><strong>Mary TTS</strong> <br> - http://mary.dfki.de/</li>
<li><strong>Espeak</strong> <br> - http://espeak.sourceforge.net/</li>
<li><strong>Mycroft Mimic</strong> <br> - https://mycroft.ai/documentation/mimic/</li>
<li><strong>Mycroft Mimic 2</strong> <br>- https://github.com/MycroftAI/mimic2</li>
</ul>
</div>
</div>
<div id="text-to-speech-challenges" class="step" data-x="30000" data-y="0">
<div class="step-container">
<h1>Text to Speech Challenges</h1>
<ul class="bulletedList">
<li><strong>Natural sounding voice</strong> - making the voice sound not robotic</li>
<li><strong>Pronunciation</strong> - often requires correction</li>
</ul>
</div>
</div>
<div id="malala" class="step" data-x="33000" data-y="0">
<div class="step-container">
<h1>A parting quote</h1>
<p class="quote">"When the whole world is silent, even one voice becomes powerful."
<br> <br> - MALALA YOUSAFZAI
</p>
</div>
</div>
<div id="thankyou" class="step" data-x="31000" data-y="0">
<div class="step-container">
<h1>Thank you :-)</h1>
<p>Questions warmly welcomed</p>
</div>
</div>
</div>
<!--
This is a UI plugin. You can read more about plugins in src/plugins/README.md.
For now, I'll just tell you that this adds some graphical controls to navigate the
presentation. In the CSS file you can style them as you want. We've put them bottom right.
-->
<div id="impress-toolbar"></div>
<!--
Hint is not related to impress.js in any way.
But it can show you how to use impress.js features in creative way.
When the presentation step is shown (selected) its element gets the class of "active" and the body element
gets the class based on active step id `impress-on-ID` (where ID is the step's id)... It may not be
so clear because of all these "ids" in previous sentence, so for example when the first step (the one with
the id of `bored`) is active, body element gets a class of `impress-on-bored`.
This class is used by this hint below. Check CSS file to see how it's shown with delayed CSS animation when
the first step of presentation is visible for a couple of seconds.
...
And when it comes to this piece of JavaScript below ... kids, don't do this at home ;)
It's just a quick and dirty workaround to get different hint text for touch devices.
In a real world it should be at least placed in separate JS file ... and the touch content should be
probably just hidden somewhere in HTML - not hard-coded in the script.
Just sayin' ;)
-->
<div class="hint">
<p>Use a spacebar or arrow keys to navigate. <br/>
Press 'P' to launch speaker console.</p>
</div>
<script>
if ("ontouchstart" in document.documentElement) {
document.querySelector(".hint").innerHTML = "<p>Swipe left or right to navigate</p>";
}
</script>
<!--
Last, but not least.
To make all described above really work, you need to include impress.js in the page.
I strongly encourage to minify it first.
In here I just include full source of the script to make it more readable.
You also need to call a `impress().init()` function to initialize impress.js presentation.
And you should do it in the end of your document. Not only because it's a good practice, but also
because it should be done when the whole document is ready.
Of course you can wrap it in any kind of "DOM ready" event, but I was too lazy to do so ;)
-->
<script src="js/impress.js"></script>
<script>impress().init();</script>
<!--
The `impress()` function also gives you access to the API that controls the presentation.
Just store the result of the call:
var api = impress();
and you will get three functions you can call:
`api.init()` - initializes the presentation,
`api.next()` - moves to next step of the presentation,
`api.prev()` - moves to previous step of the presentation,
`api.goto( stepIndex | stepElementId | stepElement, [duration] )` - moves the presentation to the step given by its index number
id or the DOM element; second parameter can be used to define duration of the transition in ms,
but it's optional - if not provided default transition duration for the presentation will be used.
You can also simply call `impress()` again to get the API, so `impress().next()` is also allowed.
Don't worry, it wont initialize the presentation again.
For some example uses of this API check the last part of the source of impress.js where the API
is used in event handlers.
-->
</body>
</html>
<!--
Now you know more or less everything you need to build your first impress.js presentation, but before
you start...
Oh, you've already cloned the code from GitHub?
You have it open in text editor?
Stop right there!
That's not how you create awesome presentations. This is only a code. Implementation of the idea that
first needs to grow in your mind.
So if you want to build great presentation take a pencil and piece of paper. And turn off the computer.
Sketch, draw and write. Brainstorm your ideas on a paper. Try to build a mind-map of what you'd like
to present. It will get you closer and closer to the layout you'll build later with impress.js.
Get back to the code only when you have your presentation ready on a paper. It doesn't make sense to do
it earlier, because you'll only waste your time fighting with positioning of useless points.
If you think I'm crazy, please put your hands on a book called "Presentation Zen". It's all about
creating awesome and engaging presentations.
Think about it. 'Cause impress.js may not help you, if you have nothing interesting to say.
-->
<!--
Are you still reading this?
For real?
I'm impressed! Feel free to let me know that you got that far (I'm @bartaz on Twitter), 'cause I'd like
to congratulate you personally :)
But you don't have to do it now. Take my advice and take some time off. Make yourself a cup of coffee, tea,
or anything you like to drink. And raise a glass for me ;)
Cheers!
-->