-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get Yale readings #15
Comments
Hey, I am not maintaining this repo anymore. And it's been a while so my memory will be partially wrong. This library relies heavily on the Unicode Unihan database, and they do not include Yale readings AFAIK: https://www.unicode.org/reports/tr38/index.html#kCantonese. |
That's fine. This python library is functional enough for my needs so I'm fine with no more updates. |
Sorry, I believe this specific question I cannot answer without reading the code more. Feel free to dig in and ask about specific areas in the code if you get stuck though!
I'm not sure this is helping, but the tone logic should basically boil down to this code:
So maybe the case you are asking about is not covered? From a Jyutping perspective that might be correct as this system chose not to represent this case, from a Yale perspective however then it's wrong. |
# -*- coding: UTF-8 -*-
# cjklib is only compatible with python 2; call it with py -2 Query_7_per_cjklib.py or at least that's what I thought I was supposed to do until the command line is giving me the correct results when I enter the python directly and calling the script isn't
# cjklib is the only thing I could find on github that claimed to correctly handle Yale's high falling/high level distinction
import sys
from cjklib import characterlookup
print sys.version_info
# set locale as traditional
cjk = characterlookup.CharacterLookup('T')
f = open('cjklib_seven.txt', 'w')
sys.stdout = open('C:/Users/Public/output.txt', 'w')
print(u'tìm'.encode('UTF-8'))
print(cjk.getCharactersForReading('tìm', 'CantoneseYale'))
print(u'tīm'.encode('UTF-8'))
print(cjk.getCharactersForReading('tīm', 'CantoneseYale')) Is what's giving me different results in interactive mode and when called as a .py file. I have both python 2.7 and python 3.8 installed. |
I think I figured out the issue - whatever encoding I have that gets passed to the interactive process for Python doesn't support precomposed characters with a macron, which was treating them as an unaccented character (tone 3) instead. Many characters can be pronounced with high level or high falling tones, sometimes for every character for a particular reading, which didn't help when I tried comparing the results of queries to see if cjklib was processing queries with the high level and high falling tone differently. So, not a bug in cjklib afaik |
Since Yale encodes the difference between the high level and high falling tones but Jyutping doesn't, would it be possible to get the Yale readings directly?
The text was updated successfully, but these errors were encountered: