-
Notifications
You must be signed in to change notification settings - Fork 0
/
resources.qmd
97 lines (67 loc) · 2.79 KB
/
resources.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
listing:
- id: dial
type: grid
template: card.ejs
contents:
ymls/dial.yml
- id: l2
type: grid
template: card.ejs
contents:
ymls/l2.yml
- id: minority
type: grid
template: card.ejs
contents:
ymls/minority.yml
- id: dicts
type: grid
template: card.ejs
contents:
ymls/dicts.yml
- id: other
type: grid
template: card.ejs
contents:
ymls/other.yml
---
## Spoken corpora {#spoken-corpora}
At the Linguistic Convergence Laboratory we create spoken corpora, which give users access to audio recordings of texts as well as their transcription. Audio access allows researchers to study languages at different levels, without having to rely on the annotator's transcription alone. It is important to keep in mind that the search function in these corpora is based on a standardized glossing of texts, so any study of linguistic features cannot rely on transcription alone but also requires listening to all the examples used.
The Laboratory develops corpora of dialectal, regional, and bilingual speech varieties, predominantly those spoken in rural areas.
An important aspect of the Laboratory's spoken corpora is the availability of sociolinguistic metadata about the speakers including information about their age, gender, education, place of residence, and command of other languages.
The spoken corpora are developed in cooperation with researchers from other universities and institutions. The Laboratory is open to the development of new resources.
::: {.panel-tabset}
### Dialect corpora
:::{#dial}
:::
### Corpora of bilingual Russian
:::{#l2}
:::
### Corpora of minority languages
:::{#minority}
:::
:::
```{r}
#| column: screen-right
#| fig-cap: "Pushkino-Mikhalevskaja, Velsky District, Arkhangelskaja oblastj by Michael Daniel"
knitr::include_graphics("images/daniel_pushkino_2.jpg")
```
## Dictionaries {#dictionaries}
Dictionaries contain audio and text data from several villages of Daghestan. The wordlists for dictionaries are primarily based on the Jena proposal for a unified comparative lexicon of the languages of Daghestan, and include both the Swadesh list and Kibrik and Kodzasov’s thesaurus for Daghestanian languages together with some additional items.
:::{#dicts}
:::
```{r}
#| column: screen-right
#| fig-cap: "Kina, Rutulsky District, Daghestan by Timur Maisak"
knitr::include_graphics("images/maisak_kina.jpg")
```
## Other resources {#other}
In addition to dictionaries and corpora, the Laboratory also develops databases and atlases containing lexical, grammatical and sociolinguistic data from many villages of Daghestan.
:::{#other}
:::
```{r}
#| column: screen-right
#| fig-cap: "Karata area, Akhvakhsky District, Daghestan by Timofey Mukhin"
knitr::include_graphics("images/mukhin_karata.jpg")
```