Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

computed features need to be discussed by the VV team (inspecting what we used in Lexibank) #25

Open
Tracked by #23
maryewal opened this issue Aug 5, 2022 · 12 comments

Comments

@maryewal
Copy link
Collaborator

maryewal commented Aug 5, 2022

No description provided.

@tihomirrangelov
Copy link
Collaborator

tihomirrangelov commented Aug 5, 2022

Mary and I discussed the computed features for VV.
The following have been identified as interesting features for Vanuatu lgs:

  • presence or lack of prenasalization (ᵐb, ⁿd...)
  • presence or lack of prenasalized bilabial trill (ᵐʙ)
  • presence or lack of plain bilabial trill (P)
  • presence or lack of prenasalized coronal trill (ⁿr)
  • presence or lack of linguolabials (n̼, t̼ etc.)
  • presence or lack of labialized consonants (pʷ, mʷ etc.)
  • presence or lack of labial-velars (k͡p, g͡b, ŋ͡m...)
  • lack of p in the consonant inventory
  • lack of p and c in the consonant inventory

@LinguList
Copy link
Contributor

Okay, we will need to standardize the annotations here, as we write pre-nasalization only with a superscript n, as the articulation follows from the sound that follows (and inconsistencies are huge). Also, if there are labiovelars in the data, we'll have to double-check that this works with the orthography profiles and have to hope they are properly transcribed here (we write kp, gb, without the bar, as we use segmented data). But these features are easy to compute.

These are 9 features, 10 would make it nicer (also for any maps and the like in any publications).

@LinguList
Copy link
Contributor

Now, we need 10 lexical features (like colexification, partial colexification, or the like). You could even have features like 3 recurs in 8 (which is an indicator of quinal systems, etc.).

@tihomirrangelov
Copy link
Collaborator

Most of the colexifications in the Lexibank paper cannot be inferred from the VV data. The following should be possible:

Red and Yellow
ThreeInEight
HairAndFeather
HearAndSmell
CommonSubstringInManAndWoman

We could also do things like:
CommonSubstringIn3DUand3PL
CommonSubstringIn1DUINCLand1PLINCL
(and other variations on the pronouns)
TwoInDU
ThreeInPL

@tihomirrangelov
Copy link
Collaborator

Regarding the labial-velars:

  1. I edited the comment to call them exactly that and not "labiovelars", which has also been used for the labialized consonants (bw, pw etc.)
  2. I know they occur on Efate and Torba, so I am not sure whether we will have to deal with them at this stage, unless @maryewal knows otherwise.

@tihomirrangelov
Copy link
Collaborator

Does it make sense to compute other, more general features, that are found in the Lexibank paper, such as:
VowelSize
ConsonantSize
CVRatio
etc. in Table 4

@SimonGreenhill
Copy link
Contributor

@tihomirrangelov -- yes.

@maryewal
Copy link
Collaborator Author

maryewal commented Aug 8, 2022

Does it make sense to compute other, more general features, that are found in the Lexibank paper, such as: VowelSize ConsonantSize CVRatio etc. in Table 4

Yes, agreed.

@maryewal
Copy link
Collaborator Author

maryewal commented Aug 8, 2022

Most of the colexifications in the Lexibank paper cannot be inferred from the VV data. The following should be possible:

Red and Yellow ThreeInEight HairAndFeather HearAndSmell CommonSubstringInManAndWoman

We could also do things like: CommonSubstringIn3DUand3PL CommonSubstringIn1DUINCLand1PLINCL (and other variations on the pronouns) TwoInDU ThreeInPL

@tihomirrangelov - agree on the pronouns, but let's discuss the others. I like @LinguList idea of incorporating the numeral systems.

@maryewal
Copy link
Collaborator Author

maryewal commented Aug 12, 2022

Now, we need 10 lexical features (like colexification, partial colexification, or the like). You could even have features like 3 recurs in 8 (which is an indicator of quinal systems, etc.).

@tihomirrangelov and I have discussed the following. These are more than 10, but some may lead to dead-ends, so we can have more to play around with if others show more interesting results:

RedAndYellow
CommonSubstringInRedAndBlood
CommonSubstringInMoonAndWhite
CommonSubstringInNightAndBlack
CommonSubstringInDirtyAndBlack
CommonSubstringInCloudAndNight
CommonSubstringInCloudAndSky
CommonSubstringInGreenAndNew
OneInSix
TwoInSeven
ThreeInEight
FourInNine
TwoInTen
FiveInTen
PersonInTwenty
ManInTwenty
HairAndFeather
HearAndSmell
CommonSubstringIn3DUAnd3PL
CommonSubstringIn1DUINCLAnd1PLINCL
CommonSubstringIn1DUEXCLAnd1PLEXCL
TwoInDU
ThreeInPL

@tihomirrangelov
Copy link
Collaborator

And we agreed on the following 20 phonology-related features:

VowelSize
ConsonantSize
CVRatio
HasPrensalization
VoicingDistinctionInFricatives
HasPrenasalizedBilabialTrill
HasPlainBilabialTrill
HasPrenasalizedCoronalTrill
HasLinguolabials
HasLabializedConsonants
HasLabialVelars
HasHandX (both velar and glottal fricative)
LacksP
LacksPandC
HasFrontRoundedVowels
HasSchwa
HasTwoAffricates
SyllableStructure
SyllableOnset
SyllableOffset

@LinguList
Copy link
Contributor

Nice, we will get back to this, once we have processed all data and created orthography profiles. I'll wait until @Bibiko is back from holiday to schedule and divide work (I'll provide the functions, and @Bibiko can help with the overall script in lexibank style, if time allows).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants