Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get Yale readings #15

Open
LawranceFung opened this issue Dec 30, 2020 · 5 comments
Open

Get Yale readings #15

LawranceFung opened this issue Dec 30, 2020 · 5 comments

Comments

@LawranceFung
Copy link

Since Yale encodes the difference between the high level and high falling tones but Jyutping doesn't, would it be possible to get the Yale readings directly?

@cburgmer
Copy link
Owner

Hey, I am not maintaining this repo anymore. And it's been a while so my memory will be partially wrong.

This library relies heavily on the Unicode Unihan database, and they do not include Yale readings AFAIK: https://www.unicode.org/reports/tr38/index.html#kCantonese.

@LawranceFung
Copy link
Author

That's fine. This python library is functional enough for my needs so I'm fine with no more updates.
I'm getting different results between calling a python script in windows cmd and pasting python interactively. Specifically, yale romanizations with high level tone seem to fetch the characters with high falling tone when called from a *.py but return the correct and distinct results when called from the interactive python shell. On a related note, when a romanization in jyutping with tone number 1 (corresponding to the high level and high falling tones) is queried through a python script, it only returns the result for high falling tone when the list should have both high falling and high level. Any idea what's up with that? It possibly has something to do with differing environment variables?
Also, if the Unihan database doesn't have Yale readings, how did you programatically distinguish between high falling and high level?

@cburgmer
Copy link
Owner

On a related note, when a romanization in jyutping with tone number 1 (corresponding to the high level and high falling tones) is queried through a python script, it only returns the result for high falling tone when the list should have both high falling and high level. Any idea what's up with that?

Sorry, I believe this specific question I cannot answer without reading the code more. Feel free to dig in and ask about specific areas in the code if you get stuck though!

how did you programatically distinguish between high falling and high level?

I'm not sure this is helping, but the tone logic should basically boil down to this code:

    DEFAULT_TONE_MAPPING = {1: '1stToneLevel', 2: '2ndTone', 3: '3rdTone',
        4: '4thTone', 5: '5thTone', 6: '6thTone'}

So maybe the case you are asking about is not covered? From a Jyutping perspective that might be correct as this system chose not to represent this case, from a Yale perspective however then it's wrong.

@LawranceFung
Copy link
Author

LawranceFung commented Dec 30, 2020

# -*- coding: UTF-8 -*-
# cjklib is only compatible with python 2; call it with py -2 Query_7_per_cjklib.py or at least that's what I thought I was supposed to do until the command line is giving me the correct results when I enter the python directly and calling the script isn't
# cjklib is the only thing I could find on github that claimed to correctly handle Yale's high falling/high level distinction
import sys
from cjklib import characterlookup
print sys.version_info
# set locale as traditional
cjk = characterlookup.CharacterLookup('T')

f = open('cjklib_seven.txt', 'w')
sys.stdout = open('C:/Users/Public/output.txt', 'w')
print(u'tìm'.encode('UTF-8'))
print(cjk.getCharactersForReading('tìm', 'CantoneseYale'))
print(u'tīm'.encode('UTF-8'))
print(cjk.getCharactersForReading('tīm', 'CantoneseYale'))

Is what's giving me different results in interactive mode and when called as a .py file. I have both python 2.7 and python 3.8 installed.

@LawranceFung
Copy link
Author

I think I figured out the issue - whatever encoding I have that gets passed to the interactive process for Python doesn't support precomposed characters with a macron, which was treating them as an unaccented character (tone 3) instead. Many characters can be pronounced with high level or high falling tones, sometimes for every character for a particular reading, which didn't help when I tried comparing the results of queries to see if cjklib was processing queries with the high level and high falling tone differently. So, not a bug in cjklib afaik
Where does cjklib get the dictionary data to distinguish the high level and high falling tone in Yale? I checked cedict, cedictgr, handedict, cfdict, unihan, and kanjidic2 and none of them show it. Actually, I just checked all the possible yale syllables and it seems cjklib only distinguishes high falling from high level in recognizing that only high level can occur when the syllable ends with p t or k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants