-
Notifications
You must be signed in to change notification settings - Fork 0
/
FastTrack_demo.Rmd
247 lines (171 loc) · 12.9 KB
/
FastTrack_demo.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
---
title: "Vowel analysis using FastTrack"
author: "Takayuki Nagamine"
date: "`r format(Sys.time(), '%d/%m/%Y')`"
output: bookdown::html_document2
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r echo=FALSE, include=FALSE}
library(knitr)
library(tidyverse)
library(emuR)
```
Fast Track is a (nearly) automated formant tracking tool that enables
you to extract the formant contours quickly and easily. The tool
functions based on a number of praat scripts and as a plug-in in Praat,
so there is no need for you to code anything. There is also room for
customising the scripts to make more individualised tool. Fast Track is
developed by Santiago Barreda at UC Davies and more information can be
found here: <https://github.com/santiagobarreda/FastTrack>
Also, the following paper explains the tool in more detail:
[Barreda, S. (2021). Fast Track: fast (nearly) automatic
formant-tracking using Praat. Linguistics Vanguard, 7(1), 20200051.
https://doi.org/10.1515/lingvan-2020-0051](https://www.degruyter.com/document/doi/10.1515/lingvan-2020-0051/html)
# Preliminaries
In the workshop, I will explain and demonstrate the following things:
1. **The fun bit**: Bulk formant estimation
2. **Another fun bit**: Tidying up data for R
3. **Slightly tedious bit**: Extracting vowels
If you would like to follow along, you can install FastTrack beforehand. A detailed step-by-step guide is available in [Santiago’s Github repository](https://github.com/santiagobarreda/FastTrack) with some video illustrations. See the [wiki](https://github.com/santiagobarreda/FastTrack/wiki) on his Github repository for the tutorial on installation (and many other things!)
**Prepared data**
I have prepared data that are compressed into zip files. Each corresponds to different stages I plan to go through during the demonstration:
- [**NWS.zip**](https://github.com/TakayukiNagamine/FastTrack-Workshop/raw/main/zip/NWS.zip):
- This is basically a folder that contains WAV and TextGrid files, but are stored in separate folders in a FastTrack-friendly manner. This is the folder structure required to extract vowels.
- [**NWS_individual_vowels.zip**](https://github.com/TakayukiNagamine/FastTrack-Workshop/raw/main/zip/NWS_individual_vowels.zip):
- The folder contains the product of the vowel extraction. Therefore, each sound file corresponds to a vowel and is stored in the 'sounds' folder. There are also two csv files that are helpful for tidying up the data afterwards.
**Speech data**
For demonstration, I’m going to use recordings of the well-known passage
“The North Wind and the Sun” that are publicly available in [the ALLSTAR
Corpus](https://groups.linguistics.northwestern.edu/speech_comm_group/allsstar2/#!/).
The ALLSTAR Corpus contains a number of spontaneous and scripted speech
that were produced by English speakers from different language
backgrounds.
Bradlow, A. R. (n.d.) ALLSSTAR: Archive of L1 and L2 Scripted and
Spontaneous Transcripts And Recordings. Retrieved from
<https://oscaar3.ling.northwestern.edu/ALLSSTARcentral/#!/recordings>.
# Bulk formant estimation
Let's imagine that you are interested in comparing vowel realisations between a native English speaker and a native Japanese learner of English.
The highlight of using FastTrack is that it enables you to extract formant frequencies from multiple files automatically. So, skipping all the preparation, here are what you can expect from FastTrack:
- FastTrack does a lot of regressions and chooses the best analysis out of multiple candidates automatically. It can also return images of all candidates and the winners for visual inspection.
- It estimates the formant frequencies at the multiple time points throughout the vowel duration.
- The output is a csv file summarising the analysis, which can then be imported into R for tidy up, visualisation, statistics, etc.
You can extract formant contours easily through the following steps:
1. Download [**NWS_individual_vowels.zip**](https://github.com/TakayukiNagamine/FastTrack-Workshop/raw/main/zip/NWS_individual_vowels.zip), unfold it and save it somewhere on your computer.
2. Open **Praat** and throw a random file in the object window. This will trigger the FastTrack functions to appear in the menu section (if FastTrack is installed properly).
3. Select **Track folder...**.
4. Specify the path to the **NWS_individual_vowels** folder in the "Folder" section. You can also adjust some other functions if prefer. For more details, please consult Sandiego's Github page.
- Note that the path to be specified here is **the folder that contains a 'sound' folder and two other csv files**. This won't work if you specify the 'sound' folder directly.
- Also note that **the naming convention is also important** in later stages - i.e. a 'sounds' folder and a 'texrgrids' folder.
```{r echo=FALSE, out.width="49%", out.height="20%", fig.show='hold', fig.align='center', fig.cap='Setting window for formant estimation (left) and an example of comparison image (right)'}
knitr::include_graphics("/Users/TakayukiNagamine/Documents/GitHub/FastTrack-Workshop/images/trackfolder01.png")
knitr::include_graphics("/Users/TakayukiNagamine/Documents/GitHub/FastTrack-Workshop/images/071_F_ENG_0025_comparison.png")
```
<!--
<p align="center">
<img src="https://github.com/TakayukiNagamine/FastTrack-Workshop/blob/baefed6e4a16e2d3b4d24cc280743bee8c3f4dfe/images/trackfolder01.png" width = 80%>
</p>
<p align="center">
<img src="https://github.com/TakayukiNagamine/FastTrack-Workshop/blob/6a5ab4e82ad7ef17021eb3fa7fbfae4262c8bc7d/images/071_F_ENG_0025_comparison.png" width = 80%>
</p>
-->
This operation yields quite a few output files, for which we don't have enough time to look through. If interested, you can download [**"NWS_results.zip"**](https://github.com/TakayukiNagamine/FastTrack-Workshop/raw/main/zip/NWS_result.zip) to see the full output. The most important here is the results of the formant estimation, stored in a csv file as [**aggregate_data.csv**](https://github.com/TakayukiNagamine/FastTrack-Workshop/blob/cea3b7c72cd9794705aec754d362449e2f93ab66/csv/aggregated_data.csv) in **processed_data** folder. Here is a glimpse of it:
```{r echo=FALSE, fig.align='center', fig.cap="An example of Aggregate_data.csv"}
knitr::include_graphics("/Users/TakayukiNagamine/Documents/GitHub/FastTrack-Workshop/images/aggregate_data.png")
```
<!--
<p align="center">
<img src="https://github.com/TakayukiNagamine/FastTrack-Workshop/blob/51f898dc45415f4d5f22719334d37dc2e7d62cb1/images/aggregate_data.png" width = 100%>
</p>
-->
# Tidying up data for R
I think FastTrack becomes even more powerful if combined with R. This, however, requires you to figure out how to tidy up the data for a more R-friendly data frame. If interested, you can look at my R code ([**FastTrack.R**](https://github.com/TakayukiNagamine/FastTrack-Workshop/blob/main/FastTrack.R)) to tidy up the data, which I wrote based on Sam's script.
Below is my attempt to visualise vowels of nonnative English speakers from the Asian countries (Hong Kong, Thailand and Vietnam) as well as an American English speaker.
```{r message=FALSE, echo=FALSE, fig.align='center', fig.cap="Comparison of vowel space in Asian Englishes speakers and an American English speaker"}
df.long <- read_csv("/Volumes/Samsung_T5/Asian_Engliehse_Listening/FastTrack/tidydata.csv")
df.midpoint.mono <- df.long %>%
filter(stress %in% c(1, 2), # each instance occurs in a stressed position
!next_sound %in% c("R", "W", "Y"), # monophthongs followed by /r/, /w/, and /j/ were avoided
!previous_sound %in% c("R", "W", "Y"), # monophthongs preceded by /r/, /w/, and /j/ were avoided
!next_sound %in% c("L", "NG"), # monophthongs followed by /l/ and /ng/ were avoided
percent == 50, # specifying vowel midpoint
!(vowel %in% c("AW", "AY", "EY", "OW", "OY")) # monophthongs
) %>%
mutate(
vowel_ipa =
case_when(
str_detect(vowel, "AA") ~ "ɑ",
str_detect(vowel, "AE") ~ "æ",
str_detect(vowel, "AH") ~ "ʌ",
str_detect(vowel, "AO") ~ "ɔ",
str_detect(vowel, "EH") ~ "ɛ",
str_detect(vowel, "ER") ~ "ɝ",
str_detect(vowel, "IH") ~ "ɪ",
str_detect(vowel, "IY") ~ "i",
str_detect(vowel, "UH") ~ "ʊ",
str_detect(vowel, "UW") ~ "u",
)
)
# Calculate vowel means
df.means <- df.midpoint.mono %>%
group_by(speaker, vowel, vowel_ipa) %>%
filter(speaker %in% c("AME_F", "HKG_F", "THA_F", "VIE_F")) %>%
summarise(
F1 = mean(Barkf1),
F2 = mean(Barkf2)
)
df.midpoint.mono %>%
filter(speaker %in% c("AME_F", "HKG_F", "THA_F", "VIE_F")) %>%
ggplot(aes(x = Barkf2, y = Barkf1, colour = vowel), show.legend = FALSE) +
geom_point(size = 1, alpha = 0.5, show.legend = FALSE) +
# stat_ellipse(geom = "polygon", alpha = 0.25, aes(fill = vowel_ipa), show.legend = FALSE) +
scale_x_reverse(position = "top") +
scale_y_reverse(position = "right") +
facet_wrap( ~ speaker) +
geom_label(data = df.means, aes(x = F2, y= F1, label = vowel, colour = vowel), show.legend = FALSE)
```
<!--
<p align="center">
<img src="https://github.com/TakayukiNagamine/FastTrack-Workshop/raw/main/images/asian_englishes_vowels.png" width = 100%>
</p>
-->
# Extracting vowels
OK, we have seen that FastTrack is a very powerful tool in automatic formant estimation for a bulk of vowels and then R could also be used to play around with the FastTrack data. However, formant estimation requires each audio file to contain vocalic components throughout the duration, which means that we need to clip out every vowel token from the recording. Thankfully, FastTrack is also ready to do this for us.
If you wish to follow along, the following instructions should work:
1. Download [**NWS.zip**](https://github.com/TakayukiNagamine/FastTrack-Workshop/raw/main/zip/NWS.zip) and save it somewhere on your computer. Again, I have already made the structure optimal for the vowel extraction.
2. Open **Praat** and throw a random file in the object window. This will trigger the FastTrack functions to appear in the menu section.
3. Select **Tools...**, then **Extract vowels with TextGrids**.
4. Specify the following:
- **Sound folder**:
- Path to "sounds" in the NWS folder containing .wav files.
- **TextGrid folder**:
- Path to "textgrids" in the NWS folder containing .TextGrid files.
- **Output folder**:
- Path to the folder where you wish to save the outputs. You could specify an existing location.
- **Which tier contains segmentation information?**:
- Specify the tier in which phonemic transcription/segmentation has been performed. In the current example, the segmentation is done in Tier 2 so type 2.
- **Which tier contains word information?**:
- Specify the tier with words. Type 1 in this case.
- **Is stress marked on vowels?**:
- Tick the box if you wish to take stress into account. If you use forced alignment to segment the speech, stress is marked alongside each vowel. For example, you will find "AE1", which means a TRAP vowel that bears the primary stress, or "AE2" the secondary stress, etc.
```{r echo=FALSE, fig.align='center', out.width="50%", fig.cap="Vowel extraction setting window"}
knitr::include_graphics("/Users/TakayukiNagamine/Documents/GitHub/FastTrack-Workshop/images/vowel_extraction.png")
```
<!--
<p align="center">
<img src="https://github.com/TakayukiNagamine/FastTrack-Workshop/blob/c3015ff5f06a6ba7d353981eadcece64e3f85706/images/vowel_extraction.png" width = 80%>
</p>
-->
# Other interesting stuff
I have demonstrated the overview of my vowel analysis workflow, but these are only basic things.
1. **Segmenting audio files into phonemes using** [**Montreal Forced
Aligner**](https://montreal-forced-aligner.readthedocs.io/en/latest/)
- You need audio files and acoomanying Praat TextGrids that contain a
tier with phonemically segmented intervals and I find it useful to
use Montreal Forced Aligner (MFA) to prepare the files. Dr Eleanor Chodroff recently held a workshop on using MFA at Rutgers
University and the video can be found on Youtube [https://www.youtube.com/watch?v=Zhj-ccMDj_w](https://www.youtube.com/watch?v=Zhj-ccMDj_w). She also
develops a comprehensive user manual for
MFA which can be found at [https://eleanorchodroff.com/tutorial/montreal-forced-aligner-v2.html](https://eleanorchodroff.com/tutorial/montreal-forced-aligner-v2.html).
2. **Formant correction**
- At this stage, my analyses remain very coarse as it still takes me a lot of time checking if the segmenetation is correct before extracting vowels. However, you may want to check the accuracy of formant estimation and hand correct the measurement if necessary. I haven't managed to learn it just yet, but FastTrack has a capacity for this. You can learn about it in [How to analyze a folder](https://github.com/santiagobarreda/FastTrack/wiki/How-to-analyze-a-folder) and [Fixing errors](https://github.com/santiagobarreda/FastTrack/wiki/Fixing-Errors).