option to add macrons and breves in collected.json #12

mcorne · 2021-09-15T14:36:29Z

it would be nice to add an option to build https://github.com/cltk/lat_models_cltk/blob/master/lemmata/collatinus/collected.json with macrons and breves in models, lemmas (except keys) and maps keys so the decliner could generate accentuated words
for example:

"uita": {
    "R": {"1": [ "1", "" ] }, 
    "abs": [], 
    "des": {
        "1": [[ "1", [ "ă" ]]], 
        "2": [[ "1", [ "ă" ]]], 
        "3": [[ "1", [ "ăm" ]]], 
        "4": [[ "1", [ "āe" ]]], 
        "5": [[ "1", [ "āe" ]]], 
        "6": [[ "1", [ "ā" ]]], 
        "7": [[ "1", [ "āe" ]]], 
        "8": [[ "1", [ "āe" ]]], 
        "9": [[ "1", [ "ās" ]]], 
        "10": [[ "1", [ "ārŭm" ]]], 
        "11": [[ "1", [ "īs" ]]], 
        "12": [[ "1", [ "īs" ]]] }, 
        "suf": [], 
        "sufd": [] 
}

The text was updated successfully, but these errors were encountered:

mcorne · 2021-09-15T15:11:32Z

possible fix in https://github.com/cltk/lat_models_cltk/blob/master/lemmata/collatinus/__convert.py:

import regex as re so regex rules continue to work
normalize strings according to the new ascii option:

def normalize_unicode(lines, ascii=True):
    if ascii:
        lines = unicodedata.normalize('NFKD', lines)
        lines = lines.encode('ASCII', 'ignore').decode()
    else:
        lines = unicodedata.normalize('NFC', lines)
    return lines

fix subsequent calls accordingly (this is really a quick fix):

with open("./src/modeles.la", encoding="utf-8") as f:
    lines = normalize_unicode(f.read(), ascii=False).split("\n")

lemmas[normalize_unicode(result["lemma"], ascii=True)] = result

with open("./src/lemmes.la", encoding="utf-8") as f:
    lines = normalize_unicode(f.read(), ascii=False).split("\n")

with open("./src/lem_ext.la", encoding="utf-8") as f:
    lines = normalize_unicode(f.read(), ascii=False).split("\n")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

option to add macrons and breves in collected.json #12

option to add macrons and breves in collected.json #12

mcorne commented Sep 15, 2021

mcorne commented Sep 15, 2021

option to add macrons and breves in collected.json #12

option to add macrons and breves in collected.json #12

Comments

mcorne commented Sep 15, 2021

mcorne commented Sep 15, 2021