Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features of the Tsuut'ina (Dene) morphological FST and dictionary models #1

Open
aarppe opened this issue May 1, 2020 · 0 comments
Labels
good first issue Good for newcomers

Comments

@aarppe
Copy link
Contributor

aarppe commented May 1, 2020

Moving UAlbertaALTLab/morphodict#189 here where it belongs:

This is a summary of Arppe, Cox, Hulden et al. (2017) (Arppe_Cox_Hulden_et_al_DLC_2017.pdf) providing some general points for consideration in generalizing the Cree model to other Indigenous languages such as Tsuut'ina (Dene).

  1. Tsuut'ina (and other Dene languages) has disjoint morphology at both the lexical and inflectional levels (like Arabic, but interdigitation syllables). The lexical tier consists of a stem and up to three lexical prefixes (potential disjoint from the stem), that together specify the meaning. The lexical tier can have allomorphy both for the stem and any of the lexical prefixes for the various aspects/moods (see below). The inflectional tier consists of three disjoint slots/chunks, that are inserted within the lexical tier. The morphotax is as follows:

LEX(outer) + INFL(outer) + LEX(middle) + INFL(middle) + LEX(inner) + INFL(inner) + STEM

  1. This is implemented as an FST using a regular appearing lemma (the third person imperfective form - N.B. no tense but rather aspect and mode), an internal representation of the disjoint four-part lexical structure, and some clever but quite simple FST calculus with flag-diacritics to put the whole together, so that inflected forms can actually be specified with a lemma + features (mostly suffixed now as far as I know, but there potentially could be some dynamic prefixing).

  2. Examples of the lemma + lexical structure are below. The non-alphabetic characters '.', '_', and '=' on the internal (right now) represent where the inner, middle, and outer inflectional morphology goes. In addition, the crude lexical meaning can be presented after the lemma in brackets [...].

itsiy[cry]:tsiy (Imperfective)
itsiy[cry]:tsày (Perfective)
itsiy[cry]:tsíł (Progressive)
itsiy[cry]:ná=chish (Repetitive)

ts'ázid[wake-up]:ts'á=zíd (Imperfective)
ts'ázid[wake-up]:ts'á=zid (Perfective)
ts'ázid[wake-up]:ts'á=ził (Progressive)
ts'ázid[wake-up]:ts'áná=zhiizh (Repetitive)

nàgudiitłod[jump-down]:nà=gu_di.tłod  (Imperfective)
nàgudiitłod[jump-down]:nà=gu_di.tłòt (Perfective)
nàgudiitłod[jump-down]:nà=gu_di.tłíł (Progressive)
nàgudiitłod[jump-down]:nàná=gu_di.tłiizh (Repetitive)
nàgudiitłod[jump-down]:nìná=gu_di.tłiizh (Repetitive)
  1. What the above means is that the FST analysis and generation works quite nicely with the following format:

lemma[gloss]+POS+ASPECT+SUBJECT(+OBJECT)

  1. On the other hand, on the Tsuut'ina dictionary side, we would want to be able to link to the lemma and represent the different allomorphs (lexical prefixes + stem) for the different aspects. This structure will be coded using some for XML-style formatted specification that will be implemented by Chris Cox.
nàgudiitłod 'jump-down'
-> Outer: nà + Middle: gu + Inner: di + Stem: tłod  (Imperfective)
-> Outer: nà + Middle: gu + Inner: di + Stem: tłòt (Perfective)
-> Outer: nà + Middle: gu + Inner: di + Stem: tłíł (Progressive)
-> Outer: nàná + Middle: gu + Inner: di + Stem: tłiizh (Repetitive)
-> Outer: nìná + Middle: gu + Inner: di + Stem: tłiizh (Repetitive)

For an inflected form, we would like to see in association with its analysis not just the lemma nàgudiitłod, but also that its lexical structure consists of the stem tłod but also the lexical prefixes , gu, and di. These lexical prefixes might be provided with some meaning, similar to preverbs in Cree/Algonquian, but likely that would be provided in the dictionary source. However, note that the meaning of the lexical prefixes can vary according to morphological context - while often indicates a repetitive form, sometimes it does not, so the meanings of the lexical prefixes might need to be determined per each lemma. Conveniently, the stems will allow for the creation of sets of semantically related lemmas. Finally, recall that the lexical tier will vary according to aspect.

  1. In sum, the treatment of the FST analysis is relatively straight-forward, and whatever is implemented for crk should work for srs. However, for the dictionary content side, one should be able to present the structure of the lexical tier (stem + lexical prefixes), as well as provide specific information on the meaning of the lexical prefixes (and the stem), and which can/will vary according to aspect/mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant