diff --git a/Links.md b/Links.md new file mode 100644 index 0000000..5ad7604 --- /dev/null +++ b/Links.md @@ -0,0 +1,32 @@ +* `src/` + * `cg3/` + * [functions.cg3](src-cg3-functions.cg3.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/cg3/functions.cg3)) + * `fst/` + * `morphology/` + * `affixes/` + * [adjectives.lexc](src-fst-morphology-affixes-adjectives.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/adjectives.lexc)) + * [nouns.lexc](src-fst-morphology-affixes-nouns.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/nouns.lexc)) + * [propernouns.lexc](src-fst-morphology-affixes-propernouns.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/propernouns.lexc)) + * [symbols.lexc](src-fst-morphology-affixes-symbols.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/symbols.lexc)) + * [verbs.lexc](src-fst-morphology-affixes-verbs.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/verbs.lexc)) + * [phonology.twolc](src-fst-morphology-phonology.twolc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/phonology.twolc)) + * [root.lexc](src-fst-morphology-root.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/root.lexc)) + * `stems/` + * [adjectives.lexc](src-fst-morphology-stems-adjectives.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/adjectives.lexc)) + * [nouns.lexc](src-fst-morphology-stems-nouns.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/nouns.lexc)) + * [numerals.lexc](src-fst-morphology-stems-numerals.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/numerals.lexc)) + * [prefixes.lexc](src-fst-morphology-stems-prefixes.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/prefixes.lexc)) + * [pronouns.lexc](src-fst-morphology-stems-pronouns.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/pronouns.lexc)) + * [verbs.lexc](src-fst-morphology-stems-verbs.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/verbs.lexc)) + * `phonetics/` + * [txt2ipa.xfscript](src-fst-phonetics-txt2ipa.xfscript.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/phonetics/txt2ipa.xfscript)) + * `transcriptions/` + * [transcriptor-abbrevs2text.lexc](src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/transcriptions/transcriptor-abbrevs2text.lexc)) + * [transcriptor-numbers-digit2text.lexc](src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/transcriptions/transcriptor-numbers-digit2text.lexc)) +* `tools/` + * `grammarcheckers/` + * [grammarchecker.cg3](tools-grammarcheckers-grammarchecker.cg3.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/tools/grammarcheckers/grammarchecker.cg3)) + * `tokenisers/` + * [tokeniser-disamb-gt-desc.pmscript](tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript)) + * [tokeniser-gramcheck-gt-desc.pmscript](tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript)) + * [tokeniser-tts-cggt-desc.pmscript](tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-tts-cggt-desc.pmscript)) diff --git a/Makefile.am b/Makefile.am new file mode 100644 index 0000000..05ef7f4 --- /dev/null +++ b/Makefile.am @@ -0,0 +1,8 @@ +## Process this file with automake to produce Makefile.in +## Copyright: Sámediggi/Divvun/UiT +## Licence: GPL v3+ + +# The generated docs are automatically detected by the automake script + +include $(top_srcdir)/../giella-core/am-shared/docs-dir-include.am +include $(top_srcdir)/../giella-core/am-shared/devtest-include.am diff --git a/Makefile.in b/Makefile.in new file mode 100644 index 0000000..a0bd130 --- /dev/null +++ b/Makefile.in @@ -0,0 +1,1087 @@ +# Makefile.in generated by automake 1.16.5 from Makefile.am. +# @configure_input@ + +# Copyright (C) 1994-2021 Free Software Foundation, Inc. + +# This Makefile.in is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY, to the extent permitted by law; without +# even the implied warranty of MERCHANTABILITY or FITNESS FOR A +# PARTICULAR PURPOSE. + +@SET_MAKE@ + +# The generated docs are automatically detected by the automake script + +# *.gt.* designates the tagset being used. +# At the end of the makefile, there is support for automatic compilation of +# other tagsets, given that the proper tagset relabeling files are defined, +# and that the target files are defined as part of the 'all' target. +# +# Filenames are built as follows: +# basictype-application-tagset-normativity[-dialect].fsttype +# +# 'application' is not specified for the regular/default morphological +# analysis/generation. +# +# Examples: +# analyser-oahpa-gt-desc.hfst +# generator-apertium-apertium-norm_single.hfst +# analyser-gt-desc.xfst +# +# Full details regarding transducer filenames can be found at: +# +# https://giellalt.uit.no/infra/infraremake/TransducerNamesInTheNewInfra.html + +#### Tailored silent output text: #### + +#### HFST tools +# Tools not yet covered by this file: +# +# hfst-determinize +# hfst-fst2strings +# hfst-info +# hfst-minus +# hfst-multiply +# hfst-pair-test +# hfst-pmatch +# hfst-push-weights +# hfst-remove-epsilons +# hfst-shuffle +# hfst-subtract +# hfst-summarize +# hfst-tokenize + +VPATH = @srcdir@ +am__is_gnu_make = { \ + if test -z '$(MAKELEVEL)'; then \ + false; \ + elif test -n '$(MAKE_HOST)'; then \ + true; \ + elif test -n '$(MAKE_VERSION)' && test -n '$(CURDIR)'; then \ + true; \ + else \ + false; \ + fi; \ +} +am__make_running_with_option = \ + case $${target_option-} in \ + ?) ;; \ + *) echo "am__make_running_with_option: internal error: invalid" \ + "target option '$${target_option-}' specified" >&2; \ + exit 1;; \ + esac; \ + has_opt=no; \ + sane_makeflags=$$MAKEFLAGS; \ + if $(am__is_gnu_make); then \ + sane_makeflags=$$MFLAGS; \ + else \ + case $$MAKEFLAGS in \ + *\\[\ \ ]*) \ + bs=\\; \ + sane_makeflags=`printf '%s\n' "$$MAKEFLAGS" \ + | sed "s/$$bs$$bs[$$bs $$bs ]*//g"`;; \ + esac; \ + fi; \ + skip_next=no; \ + strip_trailopt () \ + { \ + flg=`printf '%s\n' "$$flg" | sed "s/$$1.*$$//"`; \ + }; \ + for flg in $$sane_makeflags; do \ + test $$skip_next = yes && { skip_next=no; continue; }; \ + case $$flg in \ + *=*|--*) continue;; \ + -*I) strip_trailopt 'I'; skip_next=yes;; \ + -*I?*) strip_trailopt 'I';; \ + -*O) strip_trailopt 'O'; skip_next=yes;; \ + -*O?*) strip_trailopt 'O';; \ + -*l) strip_trailopt 'l'; skip_next=yes;; \ + -*l?*) strip_trailopt 'l';; \ + -[dEDm]) skip_next=yes;; \ + -[JT]) skip_next=yes;; \ + esac; \ + case $$flg in \ + *$$target_option*) has_opt=yes; break;; \ + esac; \ + done; \ + test $$has_opt = yes +am__make_dryrun = (target_option=n; $(am__make_running_with_option)) +am__make_keepgoing = (target_option=k; $(am__make_running_with_option)) +pkgdatadir = $(datadir)/@PACKAGE@ +pkgincludedir = $(includedir)/@PACKAGE@ +pkglibdir = $(libdir)/@PACKAGE@ +pkglibexecdir = $(libexecdir)/@PACKAGE@ +am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd +install_sh_DATA = $(install_sh) -c -m 644 +install_sh_PROGRAM = $(install_sh) -c +install_sh_SCRIPT = $(install_sh) -c +INSTALL_HEADER = $(INSTALL_DATA) +transform = $(program_transform_name) +NORMAL_INSTALL = : +PRE_INSTALL = : +POST_INSTALL = : +NORMAL_UNINSTALL = : +PRE_UNINSTALL = : +POST_UNINSTALL = : +build_triplet = @build@ +host_triplet = @host@ +subdir = docs +ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 +am__aclocal_m4_deps = $(top_srcdir)/m4/ax_check_gnu_make.m4 \ + $(top_srcdir)/m4/ax_compare_version.m4 \ + $(top_srcdir)/m4/ax_python_module.m4 \ + $(top_srcdir)/m4/giella-config-files.m4 \ + $(top_srcdir)/m4/giella-macros.m4 $(top_srcdir)/m4/hfst.m4 \ + $(top_srcdir)/configure.ac +am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ + $(ACLOCAL_M4) +DIST_COMMON = $(srcdir)/Makefile.am $(am__DIST_COMMON) +mkinstalldirs = $(install_sh) -d +CONFIG_CLEAN_FILES = +CONFIG_CLEAN_VPATH_FILES = +AM_V_P = $(am__v_P_@AM_V@) +am__v_P_ = $(am__v_P_@AM_DEFAULT_V@) +am__v_P_0 = false +am__v_P_1 = : +AM_V_GEN = $(am__v_GEN_@AM_V@) +am__v_GEN_ = $(am__v_GEN_@AM_DEFAULT_V@) +am__v_GEN_0 = @echo " GEN " $@; +am__v_GEN_1 = +AM_V_at = $(am__v_at_@AM_V@) +am__v_at_ = $(am__v_at_@AM_DEFAULT_V@) +am__v_at_0 = @ +am__v_at_1 = +SOURCES = +DIST_SOURCES = +am__can_run_installinfo = \ + case $$AM_UPDATE_INFO_DIR in \ + n|no|NO) false;; \ + *) (install-info --version) >/dev/null 2>&1;; \ + esac +am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; +am__vpath_adj = case $$p in \ + $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \ + *) f=$$p;; \ + esac; +am__strip_dir = f=`echo $$p | sed -e 's|^.*/||'`; +am__install_max = 40 +am__nobase_strip_setup = \ + srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*|]/\\\\&/g'` +am__nobase_strip = \ + for p in $$list; do echo "$$p"; done | sed -e "s|$$srcdirstrip/||" +am__nobase_list = $(am__nobase_strip_setup); \ + for p in $$list; do echo "$$p $$p"; done | \ + sed "s| $$srcdirstrip/| |;"' / .*\//!s/ .*/ ./; s,\( .*\)/[^/]*$$,\1,' | \ + $(AWK) 'BEGIN { files["."] = "" } { files[$$2] = files[$$2] " " $$1; \ + if (++n[$$2] == $(am__install_max)) \ + { print $$2, files[$$2]; n[$$2] = 0; files[$$2] = "" } } \ + END { for (dir in files) print dir, files[dir] }' +am__base_list = \ + sed '$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;s/\n/ /g' | \ + sed '$$!N;$$!N;$$!N;$$!N;s/\n/ /g' +am__uninstall_files_from_dir = { \ + test -z "$$files" \ + || { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \ + || { echo " ( cd '$$dir' && rm -f" $$files ")"; \ + $(am__cd) "$$dir" && rm -f $$files; }; \ + } +am__installdirs = "$(DESTDIR)$(docdir)" +DATA = $(doc_DATA) +am__tagged_files = $(HEADERS) $(SOURCES) $(TAGS_FILES) $(LISP) +am__DIST_COMMON = $(srcdir)/Makefile.in \ + $(top_srcdir)/../giella-core/am-shared/devtest-include.am \ + $(top_srcdir)/../giella-core/am-shared/docs-dir-include.am \ + $(top_srcdir)/../giella-core/am-shared/silent_build-include.am +DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) +ACLOCAL = @ACLOCAL@ +ALT_ORTHS = @ALT_ORTHS@ +ALT_WSS = @ALT_WSS@ +AMTAR = @AMTAR@ +AM_DEFAULT_VERBOSITY = @AM_DEFAULT_VERBOSITY@ +AREAS = @AREAS@ +AUTOCONF = @AUTOCONF@ +AUTOHEADER = @AUTOHEADER@ +AUTOMAKE = @AUTOMAKE@ +AWK = @AWK@ +BC = @BC@ +CG3_CFLAGS = @CG3_CFLAGS@ +CG3_LIBS = @CG3_LIBS@ +CGFLOOKUP = @CGFLOOKUP@ +CG_MWESPLIT = @CG_MWESPLIT@ +CG_RELABEL = @CG_RELABEL@ +CSCOPE = @CSCOPE@ +CTAGS = @CTAGS@ +CYGPATH_W = @CYGPATH_W@ +CYGWINJAVAPATH = @CYGWINJAVAPATH@ +DEFAULT_ANALYSERS = @DEFAULT_ANALYSERS@ +DEFAULT_CUSTOM_FSTS = @DEFAULT_CUSTOM_FSTS@ +DEFAULT_FOMA = @DEFAULT_FOMA@ +DEFAULT_GENERATORS = @DEFAULT_GENERATORS@ +DEFAULT_HFST = @DEFAULT_HFST@ +DEFAULT_HFST_BACKEND = @DEFAULT_HFST_BACKEND@ +DEFAULT_HYPERMIN = @DEFAULT_HYPERMIN@ +DEFAULT_ORTH = @DEFAULT_ORTH@ +DEFAULT_REVERCI = @DEFAULT_REVERCI@ +DEFAULT_SPELLER_MINIMISATION = @DEFAULT_SPELLER_MINIMISATION@ +DEFAULT_WS = @DEFAULT_WS@ +DEFAULT_XFST = @DEFAULT_XFST@ +DEFS = @DEFS@ +DIALECTS = @DIALECTS@ +DIFFTOOL = @DIFFTOOL@ +DIVVUN_ACCURACY = @DIVVUN_ACCURACY@ +DIVVUN_CHECKER = @DIVVUN_CHECKER@ +DIVVUN_VALIDATE_SUGGEST = @DIVVUN_VALIDATE_SUGGEST@ +ECHO_C = @ECHO_C@ +ECHO_N = @ECHO_N@ +ECHO_T = @ECHO_T@ +ETAGS = @ETAGS@ +FLOOKUP = @FLOOKUP@ +FOMA = @FOMA@ +FORREST = @FORREST@ +GAWK = @GAWK@ +GIELLA_CORE = @GIELLA_CORE@ +GIELLA_CORE_VERSION = @GIELLA_CORE_VERSION@ +GLANG = @GLANG@ +GLANG2 = @GLANG2@ +GLANGUAGE = @GLANGUAGE@ +GRAMCHECKVERSION = @GRAMCHECKVERSION@ +GTCORE = @GTCORE@ +GTCORESH = @GTCORESH@ +GTGRAMTOOL = @GTGRAMTOOL@ +GTLANG = @GTLANG@ +GTLANG2 = @GTLANG2@ +GTLANGUAGE = @GTLANGUAGE@ +GZIP = @GZIP@ +HFST_COMPOSE = @HFST_COMPOSE@ +HFST_COMPOSE_INTERSECT = @HFST_COMPOSE_INTERSECT@ +HFST_CONCATENATE = @HFST_CONCATENATE@ +HFST_CONJUNCT = @HFST_CONJUNCT@ +HFST_DETERMINIZE = @HFST_DETERMINIZE@ +HFST_DISJUNCT = @HFST_DISJUNCT@ +HFST_FOMA = @HFST_FOMA@ +HFST_FORMAT_NAME = @HFST_FORMAT_NAME@ +HFST_FST2FST = @HFST_FST2FST@ +HFST_FST2STRINGS = @HFST_FST2STRINGS@ +HFST_FST2TXT = @HFST_FST2TXT@ +HFST_INFO = @HFST_INFO@ +HFST_INTERSECT = @HFST_INTERSECT@ +HFST_INVERT = @HFST_INVERT@ +HFST_LEXC = @HFST_LEXC@ +HFST_LOOKUP = @HFST_LOOKUP@ +HFST_MINIMIZE = @HFST_MINIMIZE@ +HFST_MINIMIZE_SPELLER = @HFST_MINIMIZE_SPELLER@ +HFST_MINUS = @HFST_MINUS@ +HFST_MULTIPLY = @HFST_MULTIPLY@ +HFST_NAME = @HFST_NAME@ +HFST_OPTIMIZED_LOOKUP = @HFST_OPTIMIZED_LOOKUP@ +HFST_OSPELL = @HFST_OSPELL@ +HFST_PAIR_TEST = @HFST_PAIR_TEST@ +HFST_PMATCH2FST = @HFST_PMATCH2FST@ +HFST_PROC = @HFST_PROC@ +HFST_PROJECT = @HFST_PROJECT@ +HFST_PRUNE_ALPHABET = @HFST_PRUNE_ALPHABET@ +HFST_PUSH_WEIGHTS = @HFST_PUSH_WEIGHTS@ +HFST_REGEXP2FST = @HFST_REGEXP2FST@ +HFST_REMOVE_EPSILONS = @HFST_REMOVE_EPSILONS@ +HFST_REPEAT = @HFST_REPEAT@ +HFST_REVERSE = @HFST_REVERSE@ +HFST_REWEIGHT = @HFST_REWEIGHT@ +HFST_SPLIT = @HFST_SPLIT@ +HFST_STRINGS2FST = @HFST_STRINGS2FST@ +HFST_SUBSTITUTE = @HFST_SUBSTITUTE@ +HFST_SUBTRACT = @HFST_SUBTRACT@ +HFST_SUMMARIZE = @HFST_SUMMARIZE@ +HFST_TOKENISE = @HFST_TOKENISE@ +HFST_TWOLC = @HFST_TWOLC@ +HFST_TXT2FST = @HFST_TXT2FST@ +HFST_XFST = @HFST_XFST@ +INSTALL = @INSTALL@ +INSTALL_DATA = @INSTALL_DATA@ +INSTALL_PROGRAM = @INSTALL_PROGRAM@ +INSTALL_SCRIPT = @INSTALL_SCRIPT@ +INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ +JV = @JV@ +LEXC = @LEXC@ +LEXREF_IN_XFSCRIPT = @LEXREF_IN_XFSCRIPT@ +LIBOBJS = @LIBOBJS@ +LIBS = @LIBS@ +LOOKUP = @LOOKUP@ +LTLIBOBJS = @LTLIBOBJS@ +MAKEINFO = @MAKEINFO@ +MKDIR_P = @MKDIR_P@ +NO_PHONOLOGY = @NO_PHONOLOGY@ +NPM = @NPM@ +ONMT_BUILD_VOCAB = @ONMT_BUILD_VOCAB@ +ONMT_TRAIN = @ONMT_TRAIN@ +PACKAGE = @PACKAGE@ +PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ +PACKAGE_NAME = @PACKAGE_NAME@ +PACKAGE_STRING = @PACKAGE_STRING@ +PACKAGE_TARNAME = @PACKAGE_TARNAME@ +PACKAGE_URL = @PACKAGE_URL@ +PACKAGE_VERSION = @PACKAGE_VERSION@ +PATGEN = @PATGEN@ +PATH_SEPARATOR = @PATH_SEPARATOR@ +PERL = @PERL@ +PKG_CONFIG = @PKG_CONFIG@ +PKG_CONFIG_LIBDIR = @PKG_CONFIG_LIBDIR@ +PKG_CONFIG_PATH = @PKG_CONFIG_PATH@ +PRINTF = @PRINTF@ +PYTHON = @PYTHON@ +PYTHON_EXEC_PREFIX = @PYTHON_EXEC_PREFIX@ +PYTHON_PLATFORM = @PYTHON_PLATFORM@ +PYTHON_PREFIX = @PYTHON_PREFIX@ +PYTHON_VERSION = @PYTHON_VERSION@ +R = @R@ +RSYNC = @RSYNC@ +SAXON = @SAXON@ +SAXONJAR = @SAXONJAR@ +SED = @SED@ +SEE = @SEE@ +SET_MAKE = @SET_MAKE@ +SHELL = @SHELL@ +SPELLERVERSION = @SPELLERVERSION@ +SPELLER_DESC_ENG = @SPELLER_DESC_ENG@ +SPELLER_DESC_NATIVE = @SPELLER_DESC_NATIVE@ +SPELLER_NAME_ENG = @SPELLER_NAME_ENG@ +SPELLER_NAME_NATIVE = @SPELLER_NAME_NATIVE@ +STRIP = @STRIP@ +TAR = @TAR@ +TWOLC = @TWOLC@ +UCONV = @UCONV@ +VERSION = @VERSION@ +VISLCG3 = @VISLCG3@ +VISLCG3_COMP = @VISLCG3_COMP@ +VOIKKOGC = @VOIKKOGC@ +VOIKKOHYPHENATE = @VOIKKOHYPHENATE@ +VOIKKOSPELL = @VOIKKOSPELL@ +VOIKKOVFSTC = @VOIKKOVFSTC@ +WGET = @WGET@ +XFST = @XFST@ +XZ = @XZ@ +ZIP = @ZIP@ +abs_builddir = @abs_builddir@ +abs_srcdir = @abs_srcdir@ +abs_top_builddir = @abs_top_builddir@ +abs_top_srcdir = @abs_top_srcdir@ +am__leading_dot = @am__leading_dot@ +am__tar = @am__tar@ +am__untar = @am__untar@ +bindir = @bindir@ +build = @build@ +build_alias = @build_alias@ +build_cpu = @build_cpu@ +build_os = @build_os@ +build_vendor = @build_vendor@ +builddir = @builddir@ +datadir = @datadir@ +datarootdir = @datarootdir@ +docdir = @docdir@ +dvidir = @dvidir@ +exec_prefix = @exec_prefix@ +gt_SHARED_common = @gt_SHARED_common@ +host = @host@ +host_alias = @host_alias@ +host_cpu = @host_cpu@ +host_os = @host_os@ +host_vendor = @host_vendor@ +htmldir = @htmldir@ +ifGNUmake = @ifGNUmake@ +includedir = @includedir@ +infodir = @infodir@ +install_sh = @install_sh@ +libdir = @libdir@ +libexecdir = @libexecdir@ +localedir = @localedir@ +localstatedir = @localstatedir@ +mandir = @mandir@ +mkdir_p = @mkdir_p@ +oldincludedir = @oldincludedir@ +pdfdir = @pdfdir@ +pkgpyexecdir = @pkgpyexecdir@ +pkgpythondir = @pkgpythondir@ +prefix = @prefix@ +program_transform_name = @program_transform_name@ +psdir = @psdir@ +pyexecdir = @pyexecdir@ +pythondir = @pythondir@ +runstatedir = @runstatedir@ +sbindir = @sbindir@ +sharedstatedir = @sharedstatedir@ +srcdir = @srcdir@ +sysconfdir = @sysconfdir@ +target_alias = @target_alias@ +top_build_prefix = @top_build_prefix@ +top_builddir = @top_builddir@ +top_srcdir = @top_srcdir@ + +# Variables: +ALLINONE_MD_PAGE = $(srcdir)/$(GTLANG).md +LINKS = $(srcdir)/Links.md +HEADER = $(srcdir)/index-header.md +INDEX = $(srcdir)/index.md +REPONAME = $(shell grep '__REPO__' $(top_srcdir)/.gut/delta.toml | cut -d'"' -f2) +# " reset syntax colouring - gets confused by the single double quote in the previous line + +# no regenerations while debugging +doc_DATA = $(INDEX) $(LINKS) $(ALLINONE_MD_PAGE) lemmacount.json maturity.json +DOCC2MDWIKI = $(GTCORE)/scripts/doccomments2ghpages.awk +DOCC2MDWIKI_CG3 = $(GTCORE)/scripts/doccomments2ghpages-vislcg.awk +GRAPHPLOTTER = $(GTCORE)/scripts/plot-speller-progress.R + +# Define all doccomment source files in a variable: +DOCSRC_XEROX := $(shell fgrep -rl \ + --include '*.lexc' \ + --include '*.twolc' \ + --include '*.pmscript' \ + --include '*.xfscript'\ + --exclude 'Makefile*' \ + --exclude 'lexicon.tmp.lexc' \ + --exclude-dir 'generated_files' \ + --exclude-dir 'orig' \ + --exclude-dir 'incoming' \ + '!! ' $(top_srcdir)/src/* $(top_srcdir)/tools/* |\ + fgrep -v incoming/ ) + +DOCSRC_CG3 := $(shell fgrep -rl \ + --include '*.cg3' \ + --exclude 'Makefile*' \ + --exclude 'lexicon.tmp.lexc' \ + --exclude-dir 'generated_files' \ + --exclude-dir 'orig' \ + --exclude-dir 'incoming' \ + '!! ' $(top_srcdir)/src/* $(top_srcdir)/tools/* |\ + fgrep -v incoming/ ) + +DOCSRC = $(sort $(DOCSRC_XEROX) $(DOCSRC_CG3) ) + +# Remove vpath prefix for nested list construction: +BARE_DOCSRC := $(subst $(top_srcdir)/,,$(DOCSRC)) + +# Create actual list of MD files: +MDFILES = $(call src2md,$(DOCSRC)) +# Append vpath prefix - for now, the files are stored in the source tree: +VPATH_MDFILES = $(addprefix $(top_srcdir)/docs/,$(MDFILES)) + +# Construct source file and md file pairs of the following format: +# src/fst/root.lexc@src-fst-root.lexc.md + +# The variable DOCSRC_MDFILES contains a list of all source files and the +# corresponding md file in the following format: +# +# src/fst/root.lexc@src-fst-root.lexc.md +# +# From this we want to construct a nested Markdown bullet list as follows +# - src +# - fst +# - [root.lexc](src-fst-root.lexc.md) +# +# The resulting list should be the content in the $(LINKS) file. +DOCSRC_MDFILES = $(shell echo $(BARE_DOCSRC) | tr ' ' '\n' > docsrc.tmp; \ + echo $(MDFILES) | tr ' ' '\n' > mdfiles.tmp; \ + paste -d '@' docsrc.tmp mdfiles.tmp; ) + + +# src file links: target should be: +# https://github.com/giellalt/lang-sma/blob/main/src/fst/root.lexc +# host: --------^ +# owner/org: ------------^^^ +# repo: ----------------------^^^ +# branch: ----------------------------------^^^ +# filepath: ---------------------------------------^^^^^^^^^^^^^^^^ +HOST = https://github.com +ORG = giellalt +SH_REPO_NAME = $(shell cd $(top_srcdir); \ + if test -d ./.git; then \ + git config --get remote.origin.url | cut -d'/' -f2 | cut -d'.' -f1; \ + elif test -d ./.svn; then \ + svn info . | grep 'Repository Root' | rev | cut -d'/' -f1 | rev | cut -d'.' -f1; \ + else \ + pwd | rev | cut -d'/' -f1 | rev; \ + fi ) + +# When running via GitHub Actions, get org/owner + reponame from the environment: +GH_REPO = $(GITHUB_REPOSITORY) +# The branch name is presently hard-coded, could be taken from the commit: +BRANCH = main +# Use GitHub info if available, fall back to shell otherwise: +REPOURL = $(shell if test "x$(GH_REPO)" != x ; then \ + echo "$(HOST)/$(GH_REPO)/blob/$(BRANCH)"; \ + else \ + echo "$(HOST)/$(ORG)/$(SH_REPO_NAME)/blob/$(BRANCH)" ; \ + fi) + + +# hfst-compose: +AM_V_COMPOSE = $(AM_V_COMPOSE_@AM_V@) +AM_V_COMPOSE_ = $(AM_V_COMPOSE_@AM_DEFAULT_V@) +AM_V_COMPOSE_0 = @echo " HCOMPOSE $@"; + +# hfst-concatenate: +AM_V_HCONCAT = $(AM_V_HCONCAT_@AM_V@) +AM_V_HCONCAT_ = $(AM_V_HCONCAT_@AM_DEFAULT_V@) +AM_V_HCONCAT_0 = @echo " HCONCAT $@"; + +# hfst-conjunct / +# hfst-intersect: +AM_V_CONJCT = $(AM_V_CONJCT_@AM_V@) +AM_V_CONJCT_ = $(AM_V_CONJCT_@AM_DEFAULT_V@) +AM_V_CONJCT_0 = @echo " HCONJCT $@"; + +# hfst-fst2fst: +AM_V_FST2FST = $(AM_V_FST2FST_@AM_V@) +AM_V_FST2FST_ = $(AM_V_FST2FST_@AM_DEFAULT_V@) +AM_V_FST2FST_0 = @echo " HFST2FST $@"; + +# hfst-minimize +AM_V_HMINIM = $(AM_V_HMINIM_@AM_V@) +AM_V_HMINIM_ = $(AM_V_HMINIM_@AM_DEFAULT_V@) +AM_V_HMINIM_0 = @echo " HMINIM $@"; + +# hfst-fst2txt: +AM_V_FST2TXT = $(AM_V_FST2TXT_@AM_V@) +AM_V_FST2TXT_ = $(AM_V_FST2TXT_@AM_DEFAULT_V@) +AM_V_FST2TXT_0 = @echo " HFST2TXT $@"; + +# hfst-foma: +AM_V_HFOMA = $(AM_V_HFOMA_@AM_V@) +AM_V_HFOMA_ = $(AM_V_HFOMA_@AM_DEFAULT_V@) +AM_V_HFOMA_0 = @echo " HFOMA $@"; + +# hfst-optimized-lookup: +AM_V_HFSTOL = $(AM_V_HFSTOL_@AM_V@) +AM_V_HFSTOL_ = $(AM_V_HFSTOL_@AM_DEFAULT_V@) +AM_V_HFSTOL_0 = @echo " HFSTOL $@"; + +# hfst-lexc: +AM_V_HLEXC = $(AM_V_HLEXC_@AM_V@) +AM_V_HLEXC_ = $(AM_V_HLEXC_@AM_DEFAULT_V@) +AM_V_HLEXC_0 = @echo " HLEXC $@"; + +# hfst-split: +AM_V_HSPLIT = $(AM_V_HSPLIT_@AM_V@) +AM_V_HSPLIT_ = $(AM_V_HSPLIT_@AM_DEFAULT_V@) +AM_V_HSPLIT_0 = @echo " HSPLIT $@"; + +# hfst-substitute: +AM_V_HSUBST = $(AM_V_HSUBST_@AM_V@) +AM_V_HSUBST_ = $(AM_V_HSUBST_@AM_DEFAULT_V@) +AM_V_HSUBST_0 = @echo " HSUBST $@"; + +# hfst-twolc: +AM_V_HTWOLC = $(AM_V_HTWOLC_@AM_V@) +AM_V_HTWOLC_ = $(AM_V_HTWOLC_@AM_DEFAULT_V@) +AM_V_HTWOLC_0 = @echo " HTWOLC $@"; + +# hfst-xfst: +AM_V_HXFST = $(AM_V_HXFST_@AM_V@) +AM_V_HXFST_ = $(AM_V_HXFST_@AM_DEFAULT_V@) +AM_V_HXFST_0 = @echo " HXFST $@"; + +# hfst-compose-intersect: +AM_V_INTRSCT = $(AM_V_INTRSCT_@AM_V@) +AM_V_INTRSCT_ = $(AM_V_INTRSCT_@AM_DEFAULT_V@) +AM_V_INTRSCT_0 = @echo " HINTRSCT $@"; + +# hfst-invert: +AM_V_INVERT = $(AM_V_INVERT_@AM_V@) +AM_V_INVERT_ = $(AM_V_INVERT_@AM_DEFAULT_V@) +AM_V_INVERT_0 = @echo " HINVERT $@"; + +# hfst-pmatch2fst +AM_V_PM2FST = $(AM_V_PM2FST_@AM_V@) +AM_V_PM2FST_ = $(AM_V_PM2FST_@AM_DEFAULT_V@) +AM_V_PM2FST_0 = @echo " HPM2FST $@"; + +# hfst-project: +AM_V_PROJECT = $(AM_V_PROJECT_@AM_V@) +AM_V_PROJECT_ = $(AM_V_PROJECT_@AM_DEFAULT_V@) +AM_V_PROJECT_0 = @echo " HPROJECT $@"; + +# hfst-prune-alphabet +AM_V_HPRUNE = $(AM_V_HPRUNE_@AM_V@) +AM_V_HPRUNE_ = $(AM_V_HPRUNE_@AM_DEFAULT_V@) +AM_V_HPRUNE_0 = @echo " HPRUNE $@"; + +# hfst-reverse +AM_V_REVERSE = $(AM_V_REVERSE_@AM_V@) +AM_V_REVERSE_ = $(AM_V_REVERSE_@AM_DEFAULT_V@) +AM_V_REVERSE_0 = @echo " HREVERSE $@"; + +# hfst-reweight: +AM_V_REWEIGHT = $(AM_V_REWEIGHT_@AM_V@) +AM_V_REWEIGHT_ = $(AM_V_REWEIGHT_@AM_DEFAULT_V@) +AM_V_REWEIGHT_0 = @echo " HREWGHT $@"; + +# hfst-regexp2fst: +AM_V_RGX2FST = $(AM_V_RGX2FST_@AM_V@) +AM_V_RGX2FST_ = $(AM_V_RGX2FST_@AM_DEFAULT_V@) +AM_V_RGX2FST_0 = @echo " HRGX2FST $@"; + +# hfst-repeat +AM_V_REPEAT = $(AM_V_REPEAT_@AM_V@) +AM_V_REPEAT_ = $(AM_V_REPEAT_@AM_DEFAULT_V@) +AM_V_REPEAT_0 = @echo " HREPEAT $@"; + +# hfst-strings2fst: +AM_V_STR2FST = $(AM_V_STR2FST_@AM_V@) +AM_V_STR2FST_ = $(AM_V_STR2FST_@AM_DEFAULT_V@) +AM_V_STR2FST_0 = @echo " HSTR2FST $@"; + +# hfst-txt2fst: +AM_V_TXT2FST = $(AM_V_TXT2FST_@AM_V@) +AM_V_TXT2FST_ = $(AM_V_TXT2FST_@AM_DEFAULT_V@) +AM_V_TXT2FST_0 = @echo " HTXT2FST $@"; + +# hfst-union / hfst-disjunct: +AM_V_UNION = $(AM_V_UNION_@AM_V@) +AM_V_UNION_ = $(AM_V_UNION_@AM_DEFAULT_V@) +AM_V_UNION_0 = @echo " HUNION $@"; + +#### LexD (Apertium) +AM_V_LEXD = $(AM_V_LEXD_@AM_V@) +AM_V_LEXD_ = $(AM_V_LEXD_@AM_DEFAULT_V@) +AM_V_LEXD_0 = @echo " LEXD $@"; + +#### Foma +AM_V_FOMA = $(AM_V_FOMA_@AM_V@) +AM_V_FOMA_ = $(AM_V_FOMA_@AM_DEFAULT_V@) +AM_V_FOMA_0 = @echo " FOMA $@"; + +#### Xerox tools +AM_V_TWOLC = $(AM_V_TWOLC_@AM_V@) +AM_V_TWOLC_ = $(AM_V_TWOLC_@AM_DEFAULT_V@) +AM_V_TWOLC_0 = @echo " TWOLC $@"; +AM_V_LEXC = $(AM_V_LEXC_@AM_V@) +AM_V_LEXC_ = $(AM_V_LEXC_@AM_DEFAULT_V@) +AM_V_LEXC_0 = @echo " LEXC $@"; +AM_V_XFST = $(AM_V_XFST_@AM_V@) +AM_V_XFST_ = $(AM_V_XFST_@AM_DEFAULT_V@) +AM_V_XFST_0 = @echo " XFST $@"; + +#### VislCG3 +AM_V_CGCOMP = $(AM_V_CGCOMP_@AM_V@) +AM_V_CGCOMP_ = $(AM_V_CGCOMP_@AM_DEFAULT_V@) +AM_V_CGCOMP_0 = @echo " CG3COMP $@"; + +#### Other tools +AM_V_CP = $(AM_V_CP_@AM_V@) +AM_V_CP_ = $(AM_V_CP_@AM_DEFAULT_V@) +AM_V_CP_0 = @echo " CP $@"; +AM_V_MV = $(AM_V_MV_@AM_V@) +AM_V_MV_ = $(AM_V_MV_@AM_DEFAULT_V@) +AM_V_MV_0 = @echo " MV $@"; +AM_V_GZIP = $(AM_V_GZIP_@AM_V@) +AM_V_GZIP_ = $(AM_V_GZIP_@AM_DEFAULT_V@) +AM_V_GZIP_0 = @echo " GZIP $@"; +AM_V_ZIP = $(AM_V_ZIP_@AM_V@) +AM_V_ZIP_ = $(AM_V_ZIP_@AM_DEFAULT_V@) +AM_V_ZIP_0 = @echo " ZIP $@"; +AM_V_SAXON = $(AM_V_SAXON_@AM_V@) +AM_V_SAXON_ = $(AM_V_SAXON_@AM_DEFAULT_V@) +AM_V_SAXON_0 = @echo " SAXON $@"; +AM_V_XSLPROC = $(AM_V_XSLPROC_@AM_V@) +AM_V_XSLPROC_ = $(AM_V_XSLPROC_@AM_DEFAULT_V@) +AM_V_XSLPROC_0 = @echo " XSLPROC $@"; +AM_V_AWK = $(AM_V_AWK_@AM_V@) +AM_V_AWK_ = $(AM_V_AWK_@AM_DEFAULT_V@) +AM_V_AWK_0 = @echo " AWK $@"; +AM_V_SED = $(AM_V_SED_@AM_V@) +AM_V_SED_ = $(AM_V_SED_@AM_DEFAULT_V@) +AM_V_SED_0 = @echo " SED $@"; +AM_V_FORREST = $(AM_V_FORREST_@AM_V@) +AM_V_FORREST_ = $(AM_V_FORREST_@AM_DEFAULT_V@) +AM_V_FORREST_0 = @echo " FORREST $@"; + +# Let the verbosity of some command line tools follow the automake verbosity. +# VERBOSITY = be quiet if V=0, unspecified otherwise +# MORE_VERBOSITY = be quiet if V=0, be verbose otherwise +VERBOSITY = $(if $(strip $(filter-out false,$(AM_V_P))), ,-q) +MORE_VERBOSITY = $(if $(strip $(filter-out false,$(AM_V_P))),-v,-q) +all: all-am + +.SUFFIXES: +$(srcdir)/Makefile.in: $(srcdir)/Makefile.am $(top_srcdir)/../giella-core/am-shared/docs-dir-include.am $(top_srcdir)/../giella-core/am-shared/silent_build-include.am $(top_srcdir)/../giella-core/am-shared/devtest-include.am $(am__configure_deps) + @for dep in $?; do \ + case '$(am__configure_deps)' in \ + *$$dep*) \ + ( cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh ) \ + && { if test -f $@; then exit 0; else break; fi; }; \ + exit 1;; \ + esac; \ + done; \ + echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign docs/Makefile'; \ + $(am__cd) $(top_srcdir) && \ + $(AUTOMAKE) --foreign docs/Makefile +Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status + @case '$?' in \ + *config.status*) \ + cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ + *) \ + echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__maybe_remake_depfiles)'; \ + cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__maybe_remake_depfiles);; \ + esac; +$(top_srcdir)/../giella-core/am-shared/docs-dir-include.am $(top_srcdir)/../giella-core/am-shared/silent_build-include.am $(top_srcdir)/../giella-core/am-shared/devtest-include.am $(am__empty): + +$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) + cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh + +$(top_srcdir)/configure: $(am__configure_deps) + cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh +$(ACLOCAL_M4): $(am__aclocal_m4_deps) + cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh +$(am__aclocal_m4_deps): +install-docDATA: $(doc_DATA) + @$(NORMAL_INSTALL) + @list='$(doc_DATA)'; test -n "$(docdir)" || list=; \ + if test -n "$$list"; then \ + echo " $(MKDIR_P) '$(DESTDIR)$(docdir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(docdir)" || exit 1; \ + fi; \ + for p in $$list; do \ + if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ + echo "$$d$$p"; \ + done | $(am__base_list) | \ + while read files; do \ + echo " $(INSTALL_DATA) $$files '$(DESTDIR)$(docdir)'"; \ + $(INSTALL_DATA) $$files "$(DESTDIR)$(docdir)" || exit $$?; \ + done + +uninstall-docDATA: + @$(NORMAL_UNINSTALL) + @list='$(doc_DATA)'; test -n "$(docdir)" || list=; \ + files=`for p in $$list; do echo $$p; done | sed -e 's|^.*/||'`; \ + dir='$(DESTDIR)$(docdir)'; $(am__uninstall_files_from_dir) +tags TAGS: + +ctags CTAGS: + +cscope cscopelist: + +distdir: $(BUILT_SOURCES) + $(MAKE) $(AM_MAKEFLAGS) distdir-am + +distdir-am: $(DISTFILES) + @srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*]/\\\\&/g'`; \ + topsrcdirstrip=`echo "$(top_srcdir)" | sed 's/[].[^$$\\*]/\\\\&/g'`; \ + list='$(DISTFILES)'; \ + dist_files=`for file in $$list; do echo $$file; done | \ + sed -e "s|^$$srcdirstrip/||;t" \ + -e "s|^$$topsrcdirstrip/|$(top_builddir)/|;t"`; \ + case $$dist_files in \ + */*) $(MKDIR_P) `echo "$$dist_files" | \ + sed '/\//!d;s|^|$(distdir)/|;s,/[^/]*$$,,' | \ + sort -u` ;; \ + esac; \ + for file in $$dist_files; do \ + if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ + if test -d $$d/$$file; then \ + dir=`echo "/$$file" | sed -e 's,/[^/]*$$,,'`; \ + if test -d "$(distdir)/$$file"; then \ + find "$(distdir)/$$file" -type d ! -perm -700 -exec chmod u+rwx {} \;; \ + fi; \ + if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ + cp -fpR $(srcdir)/$$file "$(distdir)$$dir" || exit 1; \ + find "$(distdir)/$$file" -type d ! -perm -700 -exec chmod u+rwx {} \;; \ + fi; \ + cp -fpR $$d/$$file "$(distdir)$$dir" || exit 1; \ + else \ + test -f "$(distdir)/$$file" \ + || cp -p $$d/$$file "$(distdir)/$$file" \ + || exit 1; \ + fi; \ + done +check-am: all-am +check: check-am +all-am: Makefile $(DATA) +installdirs: + for dir in "$(DESTDIR)$(docdir)"; do \ + test -z "$$dir" || $(MKDIR_P) "$$dir"; \ + done +install: install-am +install-exec: install-exec-am +install-data: install-data-am +uninstall: uninstall-am + +install-am: all-am + @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am + +installcheck: installcheck-am +install-strip: + if test -z '$(STRIP)'; then \ + $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ + install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ + install; \ + else \ + $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ + install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ + "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'" install; \ + fi +mostlyclean-generic: + +clean-generic: + +distclean-generic: + -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) + -test . = "$(srcdir)" || test -z "$(CONFIG_CLEAN_VPATH_FILES)" || rm -f $(CONFIG_CLEAN_VPATH_FILES) + +maintainer-clean-generic: + @echo "This command is intended for maintainers to use" + @echo "it deletes files that may require special tools to rebuild." +clean: clean-am + +clean-am: clean-generic clean-local mostlyclean-am + +distclean: distclean-am + -rm -f Makefile +distclean-am: clean-am distclean-generic + +dvi: dvi-am + +dvi-am: + +html: html-am + +html-am: + +info: info-am + +info-am: + +install-data-am: install-docDATA + +install-dvi: install-dvi-am + +install-dvi-am: + +install-exec-am: + +install-html: install-html-am + +install-html-am: + +install-info: install-info-am + +install-info-am: + +install-man: + +install-pdf: install-pdf-am + +install-pdf-am: + +install-ps: install-ps-am + +install-ps-am: + +installcheck-am: + +maintainer-clean: maintainer-clean-am + -rm -f Makefile +maintainer-clean-am: distclean-am maintainer-clean-generic + +mostlyclean: mostlyclean-am + +mostlyclean-am: mostlyclean-generic + +pdf: pdf-am + +pdf-am: + +ps: ps-am + +ps-am: + +uninstall-am: uninstall-docDATA + +.MAKE: install-am install-strip + +.PHONY: all all-am check check-am clean clean-generic clean-local \ + cscopelist-am ctags-am distclean distclean-generic distdir dvi \ + dvi-am html html-am info info-am install install-am \ + install-data install-data-am install-docDATA install-dvi \ + install-dvi-am install-exec install-exec-am install-html \ + install-html-am install-info install-info-am install-man \ + install-pdf install-pdf-am install-ps install-ps-am \ + install-strip installcheck installcheck-am installdirs \ + maintainer-clean maintainer-clean-generic mostlyclean \ + mostlyclean-generic pdf pdf-am ps ps-am tags-am uninstall \ + uninstall-am uninstall-docDATA + +.PRECIOUS: Makefile + + +.PHONY: generate-markdown regenerate-markdown + +regenerate-markdown: generate-markdown + +@WANT_SPELLERS_TRUE@speller-report.svg: speller-report.tsv $(GRAPHPLOTTER) +@WANT_SPELLERS_TRUE@ $(AM_V_GEN)"$(R)" --no-save < $(GRAPHPLOTTER) + +@WANT_SPELLERS_TRUE@speller-report.tsv: spell-tests.tsv $(top_builddir)/tools/spellcheckers/$(GTLANG2).zhfst +@WANT_SPELLERS_TRUE@ $(AM_V_GEN)$(DIVVUN_ACCURACY) $< $(top_builddir)/tools/spellcheckers/$(GTLANG2).zhfst -t $@ + +@WANT_SPELLERS_TRUE@report.json: spell-tests.tsv $(top_builddir)/tools/spellcheckers/$(GTLANG2).zhfst +@WANT_SPELLERS_TRUE@ $(AM_V_GEN)$(DIVVUN_ACCURACY) $< $(top_builddir)/tools/spellcheckers/$(GTLANG2).zhfst -o $@ + +@WANT_SPELLERS_TRUE@spell-tests.tsv: +@WANT_SPELLERS_TRUE@ $(AM_V_GEN)cut -f 1,2 `find $(top_srcdir) -name typos.txt` |\ +@WANT_SPELLERS_TRUE@ egrep -v '^#' > $@ + +@WANT_SPELLERS_FALSE@speller-report.svg: +@WANT_SPELLERS_FALSE@ @echo need to configure --enable-spellers to generate statistics +@WANT_SPELLERS_FALSE@ touch $@ + +@WANT_SPELLERS_FALSE@report.json: +@WANT_SPELLERS_FALSE@ @echo need to configure --enable-spellers to generate statistics +@WANT_SPELLERS_FALSE@ touch $@ + +# Generate endpoint json file for shield.io lemma count badge. +# Only to be stored in the gh-pages branch, ignored in main. +$(srcdir)/lemmacount.json: + $(AM_V_GEN)$(GTCORE)/scripts/make-lemmacount.json.sh $(abs_top_srcdir) > $@ + +# Generate a maturity.json file as endpoint for the maturity badge. +$(srcdir)/maturity.json: + $(AM_V_GEN)$(GTCORE)/scripts/make-maturity.json.sh $(REPONAME) > $@ + +# Convert source filenames to extracted documentation filenames, VPATH safe: +# ../../../src/fst/morphology/stems/adverbs.lexc => src-fst-morphology-stems-adverbs.lexc.md +define src2md +$(addsuffix .md,$(subst /,-,$(subst $(top_srcdir)/,,$(1)))) +endef + +# Extract Markdown doccomments: +define make_md_files +$$(top_srcdir)/docs/$$(call src2md,$(1)): $(1) + $$(AM_V_AWK)"$(GAWK)" -v REPOURL=$(REPOURL) -v GLANG=$(GLANG) -f $(DOCC2MDWIKI) $$< |\ + $(SED) -e 's/@/REALLY_AT/g' |\ + tr '\n' '@' |\ + $(SED) -e 's/@@@*/@@/g' |\ + tr '@' '\n' |\ + $(SED) -e 's/REALLY_AT/@/g' > $$@ +endef +define make_md_files_cg3 +$$(top_srcdir)/docs/$$(call src2md,$(1)): $(1) + $$(AM_V_AWK)"$(GAWK)" -v REPOURL=$(REPOURL) -v GLANG=$(GLANG) -f $(DOCC2MDWIKI_CG3) $$< |\ + $(SED) -e 's/@/REALLY_AT/g' |\ + tr '\n' '@' |\ + $(SED) -e 's/@@@*/@@/g' |\ + tr '@' '\n' |\ + $(SED) -e 's/REALLY_AT/@/g' > $$@ +endef + +# Build each MD file: +$(foreach f,$(DOCSRC_XEROX),$(eval $(call make_md_files,$(f)))) +$(foreach f,$(DOCSRC_CG3),$(eval $(call make_md_files_cg3,$(f)))) + +# Collect all target files into one big MD file: +# Remove the VPATH prefix to create the header for each file/chapter: +$(ALLINONE_MD_PAGE): $(VPATH_MDFILES) + $(AM_V_GEN)printf "# $(GLANGUAGE) language model documentation\n\nAll doc-comment documentation in one large file.\n" > $@ + for f in $(VPATH_MDFILES); do \ + header=$${f#"$(top_srcdir)/docs/"};\ + printf "\n---\n\n# $$header \n\n" >> $@ ;\ + cat $$f >> $@ ;\ + done + +$(LINKS): + $(AM_V_GEN)for doc2md in $(DOCSRC_MDFILES) ; do \ + doc=`echo "$$doc2md" | cut -d '@' -f 1` ;\ + md=`echo "$$doc2md" | cut -d '@' -f 2` ;\ + d=`dirname "$$doc"` ;\ + docname=`basename "$$doc" .md` ;\ + b=`basename "$$md" .md` ;\ + html=$$b.html ;\ + d1=`echo "$$d" | cut -d '/' -f 1` ;\ + d2=`echo "$$d" | cut -d '/' -f 2` ;\ + d3=`echo "$$d" | cut -d '/' -f 3` ;\ + d4=`echo "$$d" | cut -d '/' -f 4` ;\ + d5=`echo "$$d" | cut -d '/' -f 5` ;\ + if test "x$$d1" != "x$$oldd1" ; then \ + echo "* \`$$d1/\`" ;\ + oldd1=$$d1 ;\ + oldd2="";\ + oldd3="";\ + oldd4="";\ + fi ; \ + if test "x$$d2" = x ; then \ + echo " * [$$docname]($$html) ([src]($(REPOURL)/$$doc))" ;\ + elif test "x$$d2" != "x$$oldd2" ; then \ + echo " * \`$$d2/\`" ;\ + oldd2=$$d2 ;\ + oldd3="";\ + oldd4="";\ + oldd5="";\ + fi ; \ + if test "x$$d3" = x -a "x$$d2" != x; then \ + echo " * [$$docname]($$html) ([src]($(REPOURL)/$$doc))" ;\ + elif test "x$$d3" != "x$$oldd3" ; then \ + echo " * \`$$d3/\`" ;\ + oldd3=$$d3 ;\ + oldd4="";\ + fi ; \ + if test "x$$d4" = x -a "x$$d3" != x ; then \ + echo " * [$$docname]($$html) ([src]($(REPOURL)/$$doc))" ;\ + elif test "x$$d4" != "x$$oldd4" ; then \ + echo " * \`$$d4/\`" ;\ + oldd4=$$d4 ;\ + oldd5="";\ + fi ; \ + if test "x$$d5" = x -a "x$$d4" != x ; then \ + echo " * [$$docname]($$html) ([src]($(REPOURL)/$$doc))" ;\ + elif test "x$$d5" != "x$$oldd5" ; then \ + echo " * \`$$d5/\`" ;\ + oldd5=$$d5 ;\ + fi ; \ + done > $@ + +empty.md: + $(AM_V_GEN)echo > $@ + +# FIXME: some temporary stuff to have index page +$(INDEX): $(HEADER) empty.md $(LINKS) + $(AM_V_GEN)cat $^ > $@ + +clean-local: + $(AM_V_at)-rm -rf $(builddir)/build + $(AM_V_at)-rm -rf $(srcdir)/build + $(AM_V_at)-rm -f $(doc_DATA) + $(AM_V_at)-rm -f *-src.md src-*.md $(srcdir)/*-src.md $(srcdir)/src-*.md + $(AM_V_at)-rm -f *-tools.md tools-*.md $(srcdir)/*-tools.md $(srcdir)/tools-*.md + $(AM_V_at)-rm -f generated-markdowns.* + $(AM_V_at)-rm -f docsrc.tmp mdfiles.tmp empty.md + +# vim: set ft=automake: + +.PHONY: devtest devtest-recursive devtest-local + +devtest: devtest-recursive + +devtest-recursive: + -for subdir in $(SUBDIRS); do \ + if test "$$subdir" = . ; then \ + continue; \ + else \ + ($(am__cd) $$subdir && $(MAKE) $(AM_MAKEFLAGS) $@) \ + fi; \ + done; \ + $(MAKE) $(AM_FLAGS) devtest-local + +devtest-local: + -for t in $(TESTS) ; do \ + echo "TEST: $$t" ;\ + if test -f "./$$t" ; then \ + srcdir=$(srcdir) GIELLA_CORE=$(GIELLA_CORE) "./$$t" ;\ + else \ + srcdir=$(srcdir) GIELLA_CORE=$(GIELLA_CORE) "$(srcdir)/$$t" ;\ + fi ;\ + done + +# Tell versions [3.59,3.63) of GNU make to not export all variables. +# Otherwise a system limit (for SysV at least) may be exceeded. +.NOEXPORT: diff --git a/_config.yml b/_config.yml new file mode 100644 index 0000000..ddd13de --- /dev/null +++ b/_config.yml @@ -0,0 +1,5 @@ +theme: jekyll-theme-minimal +title: Tokelauan NLP Grammar +description: Finite state and Constraint Grammar based analysers, proofing tools and other resources +plugins: + - jemoji diff --git a/_includes/toc.html b/_includes/toc.html new file mode 100644 index 0000000..85f3f62 --- /dev/null +++ b/_includes/toc.html @@ -0,0 +1,182 @@ +{% capture tocWorkspace %} + {% comment %} + Copyright (c) 2017 Vladimir "allejo" Jimenez + + Permission is hereby granted, free of charge, to any person + obtaining a copy of this software and associated documentation + files (the "Software"), to deal in the Software without + restriction, including without limitation the rights to use, + copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the + Software is furnished to do so, subject to the following + conditions: + + The above copyright notice and this permission notice shall be + included in all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES + OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT + HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, + WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + OTHER DEALINGS IN THE SOFTWARE. + {% endcomment %} + {% comment %} + Version 1.1.0 + https://github.com/allejo/jekyll-toc + + "...like all things liquid - where there's a will, and ~36 hours to spare, there's usually a/some way" ~jaybe + + Usage: + {% include toc.html html=content sanitize=true class="inline_toc" id="my_toc" h_min=2 h_max=3 %} + + Parameters: + * html (string) - the HTML of compiled markdown generated by kramdown in Jekyll + + Optional Parameters: + * sanitize (bool) : false - when set to true, the headers will be stripped of any HTML in the TOC + * class (string) : '' - a CSS class assigned to the TOC + * id (string) : '' - an ID to assigned to the TOC + * h_min (int) : 1 - the minimum TOC header level to use; any header lower than this value will be ignored + * h_max (int) : 6 - the maximum TOC header level to use; any header greater than this value will be ignored + * ordered (bool) : false - when set to true, an ordered list will be outputted instead of an unordered list + * item_class (string) : '' - add custom class(es) for each list item; has support for '%level%' placeholder, which is the current heading level + * submenu_class (string) : '' - add custom class(es) for each child group of headings; has support for '%level%' placeholder which is the current "submenu" heading level + * base_url (string) : '' - add a base url to the TOC links for when your TOC is on another page than the actual content + * anchor_class (string) : '' - add custom class(es) for each anchor element + * skip_no_ids (bool) : false - skip headers that do not have an `id` attribute + + Output: + An ordered or unordered list representing the table of contents of a markdown block. This snippet will only + generate the table of contents and will NOT output the markdown given to it + {% endcomment %} + + {% capture newline %} + {% endcapture %} + {% assign newline = newline | rstrip %} + + {% capture deprecation_warnings %}{% endcapture %} + + {% if include.baseurl %} + {% capture deprecation_warnings %}{{ deprecation_warnings }}{{ newline }}{% endcapture %} + {% endif %} + + {% if include.skipNoIDs %} + {% capture deprecation_warnings %}{{ deprecation_warnings }}{{ newline }}{% endcapture %} + {% endif %} + + {% capture jekyll_toc %}{% endcapture %} + {% assign orderedList = include.ordered | default: false %} + {% assign baseURL = include.base_url | default: include.baseurl | default: '' %} + {% assign skipNoIDs = include.skip_no_ids | default: include.skipNoIDs | default: false %} + {% assign minHeader = include.h_min | default: 1 %} + {% assign maxHeader = include.h_max | default: 6 %} + {% assign nodes = include.html | strip | split: ' maxHeader %} + {% continue %} + {% endif %} + + {% assign _workspace = node | split: '' | first }}>{% endcapture %} + {% assign header = _workspace[0] | replace: _hAttrToStrip, '' %} + + {% if include.item_class and include.item_class != blank %} + {% capture listItemClass %} class="{{ include.item_class | replace: '%level%', currLevel | split: '.' | join: ' ' }}"{% endcapture %} + {% endif %} + + {% if include.submenu_class and include.submenu_class != blank %} + {% assign subMenuLevel = currLevel | minus: 1 %} + {% capture subMenuClass %} class="{{ include.submenu_class | replace: '%level%', subMenuLevel | split: '.' | join: ' ' }}"{% endcapture %} + {% endif %} + + {% capture anchorBody %}{% if include.sanitize %}{{ header | strip_html }}{% else %}{{ header }}{% endif %}{% endcapture %} + + {% if htmlID %} + {% capture anchorAttributes %} href="{% if baseURL %}{{ baseURL }}{% endif %}#{{ htmlID }}"{% endcapture %} + + {% if include.anchor_class %} + {% capture anchorAttributes %}{{ anchorAttributes }} class="{{ include.anchor_class | split: '.' | join: ' ' }}"{% endcapture %} + {% endif %} + + {% capture listItem %}{{ anchorBody }}{% endcapture %} + {% elsif skipNoIDs == true %} + {% continue %} + {% else %} + {% capture listItem %}{{ anchorBody }}{% endcapture %} + {% endif %} + + {% if currLevel > lastLevel %} + {% capture jekyll_toc %}{{ jekyll_toc }}<{{ listModifier }}{{ subMenuClass }}>{% endcapture %} + {% elsif currLevel < lastLevel %} + {% assign repeatCount = lastLevel | minus: currLevel %} + + {% for i in (1..repeatCount) %} + {% capture jekyll_toc %}{{ jekyll_toc }}{% endcapture %} + {% endfor %} + + {% capture jekyll_toc %}{{ jekyll_toc }}{% endcapture %} + {% else %} + {% capture jekyll_toc %}{{ jekyll_toc }}{% endcapture %} + {% endif %} + + {% capture jekyll_toc %}{{ jekyll_toc }}{{ listItem }}{% endcapture %} + + {% assign lastLevel = currLevel %} + {% assign firstHeader = false %} + {% endfor %} + + {% assign repeatCount = minHeader | minus: 1 %} + {% assign repeatCount = lastLevel | minus: repeatCount %} + {% for i in (1..repeatCount) %} + {% capture jekyll_toc %}{{ jekyll_toc }}{% endcapture %} + {% endfor %} + + {% if jekyll_toc != '' %} + {% assign rootAttributes = '' %} + {% if include.class and include.class != blank %} + {% capture rootAttributes %} class="{{ include.class | split: '.' | join: ' ' }}"{% endcapture %} + {% endif %} + + {% if include.id and include.id != blank %} + {% capture rootAttributes %}{{ rootAttributes }} id="{{ include.id }}"{% endcapture %} + {% endif %} + + {% if rootAttributes %} + {% assign nodes = jekyll_toc | split: '>' %} + {% capture jekyll_toc %}<{{ listModifier }}{{ rootAttributes }}>{{ nodes | shift | join: '>' }}>{% endcapture %} + {% endif %} + {% endif %} +{% endcapture %}{% assign tocWorkspace = '' %}{{ deprecation_warnings }}{{ jekyll_toc }} \ No newline at end of file diff --git a/_layouts/default.html b/_layouts/default.html new file mode 100644 index 0000000..3d5cbe2 --- /dev/null +++ b/_layouts/default.html @@ -0,0 +1,71 @@ + + + + + + + + +{% seo %} + + + + +
+
+

{{ site.title | default: site.github.repository_name }}

+ + {% if site.logo %} + Logo + {% endif %} + +

{{ site.description | default: site.github.project_tagline }}

+ + {% if site.github.is_project_page %} +

View the project on GitHub {{ site.github.repository_nwo }}

+ {% endif %} + + {% if site.github.is_user_page %} +

View GiellaLT on GitHub

+ {% endif %} + + {% if site.show_downloads %} + + {% endif %} +
+

Page Content

+ {% include toc.html html=content sanitize=true class="left_toc" id="left_toc" h_min=2 h_max=6 %} +
+
+
+ + {{ content }} + +
+
+ {% if site.github.is_project_page %} +

This project is maintained by GiellaLT. + More information on the GiellaLT site.

+ {% endif %} +

Hosted on GitHub Pages — Theme by orderedlist

+
+
+ + {% if site.google_analytics %} + + {% endif %} + + diff --git a/assets/css/style.scss b/assets/css/style.scss new file mode 100644 index 0000000..377f1a1 --- /dev/null +++ b/assets/css/style.scss @@ -0,0 +1,5 @@ +--- +--- + +@import "{{ site.theme }}"; +@import "/assets/css/giellalt-site-global.css" ; diff --git a/index-header.md b/index-header.md new file mode 100644 index 0000000..bf38b4a --- /dev/null +++ b/index-header.md @@ -0,0 +1,17 @@ +# Tokelauan documentation + +[![Maturity](https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2Fgiellalt%2Flang-tkl%2Fgh-pages%2Fmaturity.json)](https://giellalt.github.io/MaturityClassification.html) +![Lemma count](https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2Fgiellalt%2Flang-tkl%2Fgh-pages%2Flemmacount.json) +[![License](https://img.shields.io/github/license/giellalt/lang-tkl)](https://github.com/giellalt/lang-tkl/blob/main/LICENSE) +[![Issues](https://img.shields.io/github/issues/giellalt/lang-tkl)](https://github.com/giellalt/lang-tkl/issues) +[![Build Status](https://divvun-tc.giellalt.org/api/github/v1/repository/giellalt/lang-tkl/main/badge.svg)](https://divvun-tc.giellalt.org/api/github/v1/repository/giellalt/lang-tkl/main/latest) + +This page documents the work on the **Tokelauan language model**. + +## Project documentation + +* Add links to project specific documentation here as needed. Keep the documentation in the `docs/` directory. + +## In-source documentation + +Below is an autogenerated list of documentation pages built from structured comments in the source code. All pages are also concatenated and can be read as one long text [here](tkl.md). diff --git a/index.md b/index.md new file mode 100644 index 0000000..0231b7f --- /dev/null +++ b/index.md @@ -0,0 +1,50 @@ +# Tokelauan documentation + +[![Maturity](https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2Fgiellalt%2Flang-tkl%2Fgh-pages%2Fmaturity.json)](https://giellalt.github.io/MaturityClassification.html) +![Lemma count](https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2Fgiellalt%2Flang-tkl%2Fgh-pages%2Flemmacount.json) +[![License](https://img.shields.io/github/license/giellalt/lang-tkl)](https://github.com/giellalt/lang-tkl/blob/main/LICENSE) +[![Issues](https://img.shields.io/github/issues/giellalt/lang-tkl)](https://github.com/giellalt/lang-tkl/issues) +[![Build Status](https://divvun-tc.giellalt.org/api/github/v1/repository/giellalt/lang-tkl/main/badge.svg)](https://divvun-tc.giellalt.org/api/github/v1/repository/giellalt/lang-tkl/main/latest) + +This page documents the work on the **Tokelauan language model**. + +## Project documentation + +* Add links to project specific documentation here as needed. Keep the documentation in the `docs/` directory. + +## In-source documentation + +Below is an autogenerated list of documentation pages built from structured comments in the source code. All pages are also concatenated and can be read as one long text [here](tkl.md). + +* `src/` + * `cg3/` + * [functions.cg3](src-cg3-functions.cg3.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/cg3/functions.cg3)) + * `fst/` + * `morphology/` + * `affixes/` + * [adjectives.lexc](src-fst-morphology-affixes-adjectives.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/adjectives.lexc)) + * [nouns.lexc](src-fst-morphology-affixes-nouns.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/nouns.lexc)) + * [propernouns.lexc](src-fst-morphology-affixes-propernouns.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/propernouns.lexc)) + * [symbols.lexc](src-fst-morphology-affixes-symbols.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/symbols.lexc)) + * [verbs.lexc](src-fst-morphology-affixes-verbs.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/verbs.lexc)) + * [phonology.twolc](src-fst-morphology-phonology.twolc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/phonology.twolc)) + * [root.lexc](src-fst-morphology-root.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/root.lexc)) + * `stems/` + * [adjectives.lexc](src-fst-morphology-stems-adjectives.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/adjectives.lexc)) + * [nouns.lexc](src-fst-morphology-stems-nouns.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/nouns.lexc)) + * [numerals.lexc](src-fst-morphology-stems-numerals.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/numerals.lexc)) + * [prefixes.lexc](src-fst-morphology-stems-prefixes.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/prefixes.lexc)) + * [pronouns.lexc](src-fst-morphology-stems-pronouns.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/pronouns.lexc)) + * [verbs.lexc](src-fst-morphology-stems-verbs.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/verbs.lexc)) + * `phonetics/` + * [txt2ipa.xfscript](src-fst-phonetics-txt2ipa.xfscript.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/phonetics/txt2ipa.xfscript)) + * `transcriptions/` + * [transcriptor-abbrevs2text.lexc](src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/transcriptions/transcriptor-abbrevs2text.lexc)) + * [transcriptor-numbers-digit2text.lexc](src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/src/fst/transcriptions/transcriptor-numbers-digit2text.lexc)) +* `tools/` + * `grammarcheckers/` + * [grammarchecker.cg3](tools-grammarcheckers-grammarchecker.cg3.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/tools/grammarcheckers/grammarchecker.cg3)) + * `tokenisers/` + * [tokeniser-disamb-gt-desc.pmscript](tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript)) + * [tokeniser-gramcheck-gt-desc.pmscript](tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript)) + * [tokeniser-tts-cggt-desc.pmscript](tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html) ([src](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-tts-cggt-desc.pmscript)) diff --git a/lemmacount.json b/lemmacount.json new file mode 100644 index 0000000..f6975c3 --- /dev/null +++ b/lemmacount.json @@ -0,0 +1 @@ +{ "schemaVersion": 1, "label": "Lemmas", "message": "10", "color": "black" } diff --git a/maturity.json b/maturity.json new file mode 100644 index 0000000..1eae480 --- /dev/null +++ b/maturity.json @@ -0,0 +1 @@ +{ "schemaVersion": 1, "label": "Maturity", "message": "Undefined", "color": "grey" } diff --git a/speller-report.html b/speller-report.html new file mode 100644 index 0000000..8a4674c --- /dev/null +++ b/speller-report.html @@ -0,0 +1,18 @@ + + + + + + + Accuracy test + + + + + + + + + + + \ No newline at end of file diff --git a/src-cg3-functions.cg3.md b/src-cg3-functions.cg3.md new file mode 100644 index 0000000..3a7e91f --- /dev/null +++ b/src-cg3-functions.cg3.md @@ -0,0 +1,29 @@ + + +* Sets for POS sub-categories + +* Sets for Semantic tags + +* Sets for Morphosyntactic properties + +* Sets for verbs + +* NP sets defined according to their morphosyntactic features + +* The PRE-NP-HEAD family of sets + +These sets model noun phrases (NPs). The idea is to first define whatever can +occur in front of the head of the NP, and thereafter negate that with the +expression **WORD - premodifiers**. + +* Miscellaneous sets + +* Border sets and their complements + +* Syntactic sets + +These were the set types. + +* * * + +This (part of) documentation was generated from [src/cg3/functions.cg3](https://github.com/giellalt/lang-tkl/blob/main/src/cg3/functions.cg3) diff --git a/src-fst-morphology-affixes-adjectives.lexc.md b/src-fst-morphology-affixes-adjectives.lexc.md new file mode 100644 index 0000000..cba7f27 --- /dev/null +++ b/src-fst-morphology-affixes-adjectives.lexc.md @@ -0,0 +1,6 @@ +Adjective inflection +The Tokelauan language adjectives compare. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/affixes/adjectives.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/adjectives.lexc) diff --git a/src-fst-morphology-affixes-nouns.lexc.md b/src-fst-morphology-affixes-nouns.lexc.md new file mode 100644 index 0000000..8a47100 --- /dev/null +++ b/src-fst-morphology-affixes-nouns.lexc.md @@ -0,0 +1,6 @@ +Noun inflection +The Tokelauan language nouns inflect in number and cases. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/affixes/nouns.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/nouns.lexc) diff --git a/src-fst-morphology-affixes-propernouns.lexc.md b/src-fst-morphology-affixes-propernouns.lexc.md new file mode 100644 index 0000000..b862f12 --- /dev/null +++ b/src-fst-morphology-affixes-propernouns.lexc.md @@ -0,0 +1,7 @@ +Proper noun inflection +The Tokelauan language proper nouns inflect in the same cases as regular +nouns, but perhaps with a colon (':') as separator. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/affixes/propernouns.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/propernouns.lexc) diff --git a/src-fst-morphology-affixes-symbols.lexc.md b/src-fst-morphology-affixes-symbols.lexc.md new file mode 100644 index 0000000..13ea998 --- /dev/null +++ b/src-fst-morphology-affixes-symbols.lexc.md @@ -0,0 +1,6 @@ + +# Symbol affixes + +* * * + +This (part of) documentation was generated from [src/fst/morphology/affixes/symbols.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/symbols.lexc) diff --git a/src-fst-morphology-affixes-verbs.lexc.md b/src-fst-morphology-affixes-verbs.lexc.md new file mode 100644 index 0000000..7992228 --- /dev/null +++ b/src-fst-morphology-affixes-verbs.lexc.md @@ -0,0 +1,6 @@ +Verb inflection +The Tokelauan language verbs inflect in persons. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/affixes/verbs.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/verbs.lexc) diff --git a/src-fst-morphology-phonology.twolc.md b/src-fst-morphology-phonology.twolc.md new file mode 100644 index 0000000..7f60e0b --- /dev/null +++ b/src-fst-morphology-phonology.twolc.md @@ -0,0 +1,18 @@ +=================================== ! +The Tokelauan morphophonological/twolc rules file ! +=================================== ! + +* *primus%>s* +* *primus00* + +* examples:* + +* examples:* + +* examples:* + +* examples:* + +* * * + +This (part of) documentation was generated from [src/fst/morphology/phonology.twolc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/phonology.twolc) diff --git a/src-fst-morphology-root.lexc.md b/src-fst-morphology-root.lexc.md new file mode 100644 index 0000000..26bfa7a --- /dev/null +++ b/src-fst-morphology-root.lexc.md @@ -0,0 +1,80 @@ + +INTRODUCTION TO MORPHOLOGICAL ANALYSER OF Tokelauan LANGUAGE. + +# Definitions for Multichar_Symbols + +## Analysis symbols +The morphological analyses of wordforms for the Tokelauan +language are presented in this system in terms of the following symbols. +(It is highly suggested to follow existing standards when adding new tags). + +The parts-of-speech are: + +The parts of speech are further split up into: + +The Usage extents are marked using following tags: +* **+Use/TTS** – **only** retained in the HFST Text-To-Speech disambiguation tokeniser +* **+Use/-TTS** – **never** retained in the HFST Text-To-Speech disambiguation tokeniser + +The nominals are inflected in the following Case and Number + +The possession is marked as such: +The comparative forms are: +Numerals are classified under: +Verb moods are: +Verb personal forms are: +Other verb forms are + +* +Symbol = independent symbols in the text stream, like £, €, © +Special symbols are classified with: +The verbs are syntactically split according to transitivity: +Special multiword units are analysed with: +Non-dictionary words can be recognised with: + +Question and Focus particles: + +Semantics are classified with + +Derivations are classified under the morphophonetic form of the suffix, the +source and target part-of-speech. + +Morphophonology +To represent phonologic variations in word forms we use the following +symbols in the lexicon files: + +And following triggers to control variation + +## Flag diacritics +We have manually optimised the structure of our lexicon using following +flag diacritics to restrict morhpological combinatorics - only allow compounds +with verbs if the verb is further derived into a noun again: +| @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised +| @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised +| @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised + +For languages that allow compounding, the following flag diacritics are needed +to control position-based compounding restrictions for nominals. Their use is +handled automatically if combined with +CmpN/xxx tags. If not used, they will +do no harm. +| @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first +| @D.CmpPref.TRUE@ | Block such words from entering ENDLEX +| @P.CmpPref.FALSE@ | Block these words from making further compounds +| @D.CmpLast.TRUE@ | Block such words from entering R +| @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding +| @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding +| @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R +| @D.CmpOnly.FALSE@ | Disallow words coming directly from root. + +Use the following flag diacritics to control downcasing of derived proper +nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use +these flags. There exists a ready-made regex that will do the actual down-casing +given the proper use of these flags. +| @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. +| @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. + +The word forms in Tokelauan language start from the lexeme roots of basic +word classes, or optionally from prefixes: + +* * * + +This (part of) documentation was generated from [src/fst/morphology/root.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/root.lexc) diff --git a/src-fst-morphology-stems-adjectives.lexc.md b/src-fst-morphology-stems-adjectives.lexc.md new file mode 100644 index 0000000..d4eae68 --- /dev/null +++ b/src-fst-morphology-stems-adjectives.lexc.md @@ -0,0 +1,6 @@ +Adjectives +Adjectives in the Tokelauan language describe the entities nouns refer to. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/adjectives.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/adjectives.lexc) diff --git a/src-fst-morphology-stems-nouns.lexc.md b/src-fst-morphology-stems-nouns.lexc.md new file mode 100644 index 0000000..b95b789 --- /dev/null +++ b/src-fst-morphology-stems-nouns.lexc.md @@ -0,0 +1,6 @@ +Nouns +Nouns in the Tokelauan language refer to objects or sets of objects, qualities, states or ideas. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/nouns.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/nouns.lexc) diff --git a/src-fst-morphology-stems-numerals.lexc.md b/src-fst-morphology-stems-numerals.lexc.md new file mode 100644 index 0000000..a1fd175 --- /dev/null +++ b/src-fst-morphology-stems-numerals.lexc.md @@ -0,0 +1,6 @@ +Numerals +Numerals in the Tokelauan language describe a nuerical quantity. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/numerals.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/numerals.lexc) diff --git a/src-fst-morphology-stems-prefixes.lexc.md b/src-fst-morphology-stems-prefixes.lexc.md new file mode 100644 index 0000000..b969866 --- /dev/null +++ b/src-fst-morphology-stems-prefixes.lexc.md @@ -0,0 +1,6 @@ +Prefixes +Prefixes in the Tokelauan language are attatched to the left of other words. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/prefixes.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/prefixes.lexc) diff --git a/src-fst-morphology-stems-pronouns.lexc.md b/src-fst-morphology-stems-pronouns.lexc.md new file mode 100644 index 0000000..e9a5cee --- /dev/null +++ b/src-fst-morphology-stems-pronouns.lexc.md @@ -0,0 +1,6 @@ +Pronouns +Pronouns in the Tokelauan language are words that may replace nouns or refer to participants in the conversation. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/pronouns.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/pronouns.lexc) diff --git a/src-fst-morphology-stems-verbs.lexc.md b/src-fst-morphology-stems-verbs.lexc.md new file mode 100644 index 0000000..ccc8456 --- /dev/null +++ b/src-fst-morphology-stems-verbs.lexc.md @@ -0,0 +1,6 @@ +Verbs +Verbs in the Tokelauan language inflect for tense. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/verbs.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/verbs.lexc) diff --git a/src-fst-phonetics-txt2ipa.xfscript.md b/src-fst-phonetics-txt2ipa.xfscript.md new file mode 100644 index 0000000..4edece2 --- /dev/null +++ b/src-fst-phonetics-txt2ipa.xfscript.md @@ -0,0 +1,164 @@ + + +retroflex plosive, voiceless t` ʈ 0288, 648 (` = ASCII 096) +retroflex plosive, voiced d` ɖ 0256, 598 +labiodental nasal F ɱ 0271, 625 +retroflex nasal n` ɳ 0273, 627 +palatal nasal J ɲ 0272, 626 +velar nasal N ŋ 014B, 331 +uvular nasal N\ ɴ 0274, 628 + +bilabial trill B\ ʙ 0299, 665 +uvular trill R\ ʀ 0280, 640 +alveolar tap 4 ɾ 027E, 638 +retroflex flap r` ɽ 027D, 637 +bilabial fricative, voiceless p\ ɸ 0278, 632 +bilabial fricative, voiced B β 03B2, 946 +dental fricative, voiceless T θ 03B8, 952 +dental fricative, voiced D ð 00F0, 240 +postalveolar fricative, voiceless S ʃ 0283, 643 +postalveolar fricative, voiced Z ʒ 0292, 658 +retroflex fricative, voiceless s` ʂ 0282, 642 +retroflex fricative, voiced z` ʐ 0290, 656 +palatal fricative, voiceless C ç 00E7, 231 +palatal fricative, voiced j\ ʝ 029D, 669 +velar fricative, voiced G ɣ 0263, 611 +uvular fricative, voiceless X χ 03C7, 967 +uvular fricative, voiced R ʁ 0281, 641 +pharyngeal fricative, voiceless X\ ħ 0127, 295 +pharyngeal fricative, voiced ?\ ʕ 0295, 661 +glottal fricative, voiced h\ ɦ 0266, 614 + +alveolar lateral fricative, vl. K +alveolar lateral fricative, vd. K\ + +labiodental approximant P (or v\) +alveolar approximant r\ +retroflex approximant r\` +velar approximant M\ + +retroflex lateral approximant l` +palatal lateral approximant L +velar lateral approximant L\ +Clicks + +bilabial O\ (O = capital letter) +dental |\ +(post)alveolar !\ +palatoalveolar =\ +alveolar lateral |\|\ +Ejectives, implosives + +ejective _> e.g. ejective p p_> +implosive _< e.g. implosive b b_< +Vowels + +close back unrounded M +close central unrounded 1 +close central rounded } +lax i I +lax y Y +lax u U + +close-mid front rounded 2 +close-mid central unrounded @\ +close-mid central rounded 8 +close-mid back unrounded 7 + +schwa ə @ + +open-mid front unrounded E +open-mid front rounded 9 +open-mid central unrounded 3 +open-mid central rounded 3\ +open-mid back unrounded V +open-mid back rounded O + +ash (ae digraph) { +open schwa (turned a) 6 + +open front rounded & +open back unrounded A +open back rounded Q +Other symbols + +voiceless labial-velar fricative W +voiced labial-palatal approx. H +voiceless epiglottal fricative H\ +voiced epiglottal fricative <\ +epiglottal plosive >\ + +alveolo-palatal fricative, vl. s\ +alveolo-palatal fricative, voiced z\ +alveolar lateral flap l\ +simultaneous S and x x\ +tie bar _ +Suprasegmentals + +primary stress " +secondary stress % +long : +half-long :\ +extra-short _X +linking mark -\ +Tones and word accents + +level extra high _T +level high _H +level mid _M +level low _L +level extra low _B +downstep ! +upstep ^ (caret, circumflex) + +contour, rising +contour, falling _F +contour, high rising _H_T +contour, low rising _B_L + +contour, rising-falling _R_F +(NB Instead of being written as diacritics with _, all prosodic +marks can alternatively be placed in a separate tier, set off +by < >, as recommended for the next two symbols.) +global rise +global fall +Diacritics + +voiceless _0 (0 = figure), e.g. n_0 +voiced _v +aspirated _h +more rounded _O (O = letter) +less rounded _c +advanced _+ +retracted _- +centralized _" +syllabic = (or _=) e.g. n= (or n_=) +non-syllabic _^ +rhoticity ` + +breathy voiced _t +creaky voiced _k +linguolabial _N +labialized _w +palatalized ' (or _j) e.g. t' (or t_j) +velarized _G +pharyngealized _?\ + +dental _d +apical _a +laminal _m +nasalized ~ (or _~) e.g. A~ (or A_~) +nasal release _n +lateral release _l +no audible release _} + +velarized or pharyngealized _e +velarized l, alternatively 5 +raised _r +lowered _o +advanced tongue root _A +retracted tongue root _q + +* * * + +This (part of) documentation was generated from [src/fst/phonetics/txt2ipa.xfscript](https://github.com/giellalt/lang-tkl/blob/main/src/fst/phonetics/txt2ipa.xfscript) diff --git a/src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md b/src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md new file mode 100644 index 0000000..dcc7ff4 --- /dev/null +++ b/src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md @@ -0,0 +1,17 @@ + + +We describe here how abbreviations are in Tokelauan are read out, e.g. +for text-to-speech systems. + +For example: + +* s.:syntynyt # ; +* os.:omaa% sukua # ; +* v.:vuosi # ; +* v.:vuonna # ; +* esim.:esimerkki # ; +* esim.:esimerkiksi # ; + +* * * + +This (part of) documentation was generated from [src/fst/transcriptions/transcriptor-abbrevs2text.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/transcriptions/transcriptor-abbrevs2text.lexc) diff --git a/src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.md b/src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.md new file mode 100644 index 0000000..492d4aa --- /dev/null +++ b/src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.md @@ -0,0 +1,11 @@ + + +% komma% :, Root ; +% tjuohkkis% :%. Root ; +% kolon% :%: Root ; +% sárggis% :%- Root ; +% násti% :%* Root ; + +* * * + +This (part of) documentation was generated from [src/fst/transcriptions/transcriptor-numbers-digit2text.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/transcriptions/transcriptor-numbers-digit2text.lexc) diff --git a/tkl.md b/tkl.md new file mode 100644 index 0000000..f10a811 --- /dev/null +++ b/tkl.md @@ -0,0 +1,974 @@ +# Tokelauan language model documentation + +All doc-comment documentation in one large file. + +--- + +# src-cg3-functions.cg3.md + + + +* Sets for POS sub-categories + +* Sets for Semantic tags + +* Sets for Morphosyntactic properties + +* Sets for verbs + +* NP sets defined according to their morphosyntactic features + +* The PRE-NP-HEAD family of sets + +These sets model noun phrases (NPs). The idea is to first define whatever can +occur in front of the head of the NP, and thereafter negate that with the +expression **WORD - premodifiers**. + +* Miscellaneous sets + +* Border sets and their complements + +* Syntactic sets + +These were the set types. + +* * * + +This (part of) documentation was generated from [src/cg3/functions.cg3](https://github.com/giellalt/lang-tkl/blob/main/src/cg3/functions.cg3) + +--- + +# src-fst-morphology-affixes-adjectives.lexc.md + +Adjective inflection +The Tokelauan language adjectives compare. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/affixes/adjectives.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/adjectives.lexc) + +--- + +# src-fst-morphology-affixes-nouns.lexc.md + +Noun inflection +The Tokelauan language nouns inflect in number and cases. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/affixes/nouns.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/nouns.lexc) + +--- + +# src-fst-morphology-affixes-propernouns.lexc.md + +Proper noun inflection +The Tokelauan language proper nouns inflect in the same cases as regular +nouns, but perhaps with a colon (':') as separator. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/affixes/propernouns.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/propernouns.lexc) + +--- + +# src-fst-morphology-affixes-symbols.lexc.md + + +# Symbol affixes + +* * * + +This (part of) documentation was generated from [src/fst/morphology/affixes/symbols.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/symbols.lexc) + +--- + +# src-fst-morphology-affixes-verbs.lexc.md + +Verb inflection +The Tokelauan language verbs inflect in persons. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/affixes/verbs.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/affixes/verbs.lexc) + +--- + +# src-fst-morphology-phonology.twolc.md + +=================================== ! +The Tokelauan morphophonological/twolc rules file ! +=================================== ! + +* *primus%>s* +* *primus00* + +* examples:* + +* examples:* + +* examples:* + +* examples:* + +* * * + +This (part of) documentation was generated from [src/fst/morphology/phonology.twolc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/phonology.twolc) + +--- + +# src-fst-morphology-root.lexc.md + + +INTRODUCTION TO MORPHOLOGICAL ANALYSER OF Tokelauan LANGUAGE. + +# Definitions for Multichar_Symbols + +## Analysis symbols +The morphological analyses of wordforms for the Tokelauan +language are presented in this system in terms of the following symbols. +(It is highly suggested to follow existing standards when adding new tags). + +The parts-of-speech are: + +The parts of speech are further split up into: + +The Usage extents are marked using following tags: +* **+Use/TTS** – **only** retained in the HFST Text-To-Speech disambiguation tokeniser +* **+Use/-TTS** – **never** retained in the HFST Text-To-Speech disambiguation tokeniser + +The nominals are inflected in the following Case and Number + +The possession is marked as such: +The comparative forms are: +Numerals are classified under: +Verb moods are: +Verb personal forms are: +Other verb forms are + +* +Symbol = independent symbols in the text stream, like £, €, © +Special symbols are classified with: +The verbs are syntactically split according to transitivity: +Special multiword units are analysed with: +Non-dictionary words can be recognised with: + +Question and Focus particles: + +Semantics are classified with + +Derivations are classified under the morphophonetic form of the suffix, the +source and target part-of-speech. + +Morphophonology +To represent phonologic variations in word forms we use the following +symbols in the lexicon files: + +And following triggers to control variation + +## Flag diacritics +We have manually optimised the structure of our lexicon using following +flag diacritics to restrict morhpological combinatorics - only allow compounds +with verbs if the verb is further derived into a noun again: +| @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised +| @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised +| @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised + +For languages that allow compounding, the following flag diacritics are needed +to control position-based compounding restrictions for nominals. Their use is +handled automatically if combined with +CmpN/xxx tags. If not used, they will +do no harm. +| @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first +| @D.CmpPref.TRUE@ | Block such words from entering ENDLEX +| @P.CmpPref.FALSE@ | Block these words from making further compounds +| @D.CmpLast.TRUE@ | Block such words from entering R +| @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding +| @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding +| @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R +| @D.CmpOnly.FALSE@ | Disallow words coming directly from root. + +Use the following flag diacritics to control downcasing of derived proper +nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use +these flags. There exists a ready-made regex that will do the actual down-casing +given the proper use of these flags. +| @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. +| @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. + +The word forms in Tokelauan language start from the lexeme roots of basic +word classes, or optionally from prefixes: + +* * * + +This (part of) documentation was generated from [src/fst/morphology/root.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/root.lexc) + +--- + +# src-fst-morphology-stems-adjectives.lexc.md + +Adjectives +Adjectives in the Tokelauan language describe the entities nouns refer to. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/adjectives.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/adjectives.lexc) + +--- + +# src-fst-morphology-stems-nouns.lexc.md + +Nouns +Nouns in the Tokelauan language refer to objects or sets of objects, qualities, states or ideas. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/nouns.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/nouns.lexc) + +--- + +# src-fst-morphology-stems-numerals.lexc.md + +Numerals +Numerals in the Tokelauan language describe a nuerical quantity. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/numerals.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/numerals.lexc) + +--- + +# src-fst-morphology-stems-prefixes.lexc.md + +Prefixes +Prefixes in the Tokelauan language are attatched to the left of other words. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/prefixes.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/prefixes.lexc) + +--- + +# src-fst-morphology-stems-pronouns.lexc.md + +Pronouns +Pronouns in the Tokelauan language are words that may replace nouns or refer to participants in the conversation. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/pronouns.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/pronouns.lexc) + +--- + +# src-fst-morphology-stems-verbs.lexc.md + +Verbs +Verbs in the Tokelauan language inflect for tense. + +* * * + +This (part of) documentation was generated from [src/fst/morphology/stems/verbs.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/morphology/stems/verbs.lexc) + +--- + +# src-fst-phonetics-txt2ipa.xfscript.md + + + +retroflex plosive, voiceless t` ʈ 0288, 648 (` = ASCII 096) +retroflex plosive, voiced d` ɖ 0256, 598 +labiodental nasal F ɱ 0271, 625 +retroflex nasal n` ɳ 0273, 627 +palatal nasal J ɲ 0272, 626 +velar nasal N ŋ 014B, 331 +uvular nasal N\ ɴ 0274, 628 + +bilabial trill B\ ʙ 0299, 665 +uvular trill R\ ʀ 0280, 640 +alveolar tap 4 ɾ 027E, 638 +retroflex flap r` ɽ 027D, 637 +bilabial fricative, voiceless p\ ɸ 0278, 632 +bilabial fricative, voiced B β 03B2, 946 +dental fricative, voiceless T θ 03B8, 952 +dental fricative, voiced D ð 00F0, 240 +postalveolar fricative, voiceless S ʃ 0283, 643 +postalveolar fricative, voiced Z ʒ 0292, 658 +retroflex fricative, voiceless s` ʂ 0282, 642 +retroflex fricative, voiced z` ʐ 0290, 656 +palatal fricative, voiceless C ç 00E7, 231 +palatal fricative, voiced j\ ʝ 029D, 669 +velar fricative, voiced G ɣ 0263, 611 +uvular fricative, voiceless X χ 03C7, 967 +uvular fricative, voiced R ʁ 0281, 641 +pharyngeal fricative, voiceless X\ ħ 0127, 295 +pharyngeal fricative, voiced ?\ ʕ 0295, 661 +glottal fricative, voiced h\ ɦ 0266, 614 + +alveolar lateral fricative, vl. K +alveolar lateral fricative, vd. K\ + +labiodental approximant P (or v\) +alveolar approximant r\ +retroflex approximant r\` +velar approximant M\ + +retroflex lateral approximant l` +palatal lateral approximant L +velar lateral approximant L\ +Clicks + +bilabial O\ (O = capital letter) +dental |\ +(post)alveolar !\ +palatoalveolar =\ +alveolar lateral |\|\ +Ejectives, implosives + +ejective _> e.g. ejective p p_> +implosive _< e.g. implosive b b_< +Vowels + +close back unrounded M +close central unrounded 1 +close central rounded } +lax i I +lax y Y +lax u U + +close-mid front rounded 2 +close-mid central unrounded @\ +close-mid central rounded 8 +close-mid back unrounded 7 + +schwa ə @ + +open-mid front unrounded E +open-mid front rounded 9 +open-mid central unrounded 3 +open-mid central rounded 3\ +open-mid back unrounded V +open-mid back rounded O + +ash (ae digraph) { +open schwa (turned a) 6 + +open front rounded & +open back unrounded A +open back rounded Q +Other symbols + +voiceless labial-velar fricative W +voiced labial-palatal approx. H +voiceless epiglottal fricative H\ +voiced epiglottal fricative <\ +epiglottal plosive >\ + +alveolo-palatal fricative, vl. s\ +alveolo-palatal fricative, voiced z\ +alveolar lateral flap l\ +simultaneous S and x x\ +tie bar _ +Suprasegmentals + +primary stress " +secondary stress % +long : +half-long :\ +extra-short _X +linking mark -\ +Tones and word accents + +level extra high _T +level high _H +level mid _M +level low _L +level extra low _B +downstep ! +upstep ^ (caret, circumflex) + +contour, rising +contour, falling _F +contour, high rising _H_T +contour, low rising _B_L + +contour, rising-falling _R_F +(NB Instead of being written as diacritics with _, all prosodic +marks can alternatively be placed in a separate tier, set off +by < >, as recommended for the next two symbols.) +global rise +global fall +Diacritics + +voiceless _0 (0 = figure), e.g. n_0 +voiced _v +aspirated _h +more rounded _O (O = letter) +less rounded _c +advanced _+ +retracted _- +centralized _" +syllabic = (or _=) e.g. n= (or n_=) +non-syllabic _^ +rhoticity ` + +breathy voiced _t +creaky voiced _k +linguolabial _N +labialized _w +palatalized ' (or _j) e.g. t' (or t_j) +velarized _G +pharyngealized _?\ + +dental _d +apical _a +laminal _m +nasalized ~ (or _~) e.g. A~ (or A_~) +nasal release _n +lateral release _l +no audible release _} + +velarized or pharyngealized _e +velarized l, alternatively 5 +raised _r +lowered _o +advanced tongue root _A +retracted tongue root _q + +* * * + +This (part of) documentation was generated from [src/fst/phonetics/txt2ipa.xfscript](https://github.com/giellalt/lang-tkl/blob/main/src/fst/phonetics/txt2ipa.xfscript) + +--- + +# src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md + + + +We describe here how abbreviations are in Tokelauan are read out, e.g. +for text-to-speech systems. + +For example: + +* s.:syntynyt # ; +* os.:omaa% sukua # ; +* v.:vuosi # ; +* v.:vuonna # ; +* esim.:esimerkki # ; +* esim.:esimerkiksi # ; + +* * * + +This (part of) documentation was generated from [src/fst/transcriptions/transcriptor-abbrevs2text.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/transcriptions/transcriptor-abbrevs2text.lexc) + +--- + +# src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.md + + + +% komma% :, Root ; +% tjuohkkis% :%. Root ; +% kolon% :%: Root ; +% sárggis% :%- Root ; +% násti% :%* Root ; + +* * * + +This (part of) documentation was generated from [src/fst/transcriptions/transcriptor-numbers-digit2text.lexc](https://github.com/giellalt/lang-tkl/blob/main/src/fst/transcriptions/transcriptor-numbers-digit2text.lexc) + +--- + +# tools-grammarcheckers-grammarchecker.cg3.md + + +# Tokelauan G R A M M A R C H E C K E R + +# DELIMITERS + +# TAGS AND SETS + +## Tags + +This section lists all the tags inherited from the fst, and used as tags +in the syntactic analysis. The next section, **Sets**, contains sets defined +on the basis of the tags listed here, those set names are not visible in the output. + +### Beginning and end of sentence +BOS +EOS + +### Parts of speech tags + +N +A +Adv +V +Pron +CS +CC +CC-CS +Po +Pr +Pcle +Num +Interj +ABBR +ACR +CLB +LEFT +RIGHT +WEB +PPUNCT +PUNCT + +COMMA +¶ + +### Tags for POS sub-categories + +Pers +Dem +Interr +Indef +Recipr +Refl +Rel +Coll +NomAg +Prop +Allegro +Arab +Romertall + +### Tags for morphosyntactic properties + +Nom +Acc +Gen +Ill +Loc +Com +Ess +Ess +Sg +Du +Pl +Cmp/SplitR +Cmp/SgNom Cmp/SgGen +Cmp/SgGen +PxSg1 +PxSg2 +PxSg3 +PxDu1 +PxDu2 +PxDu3 +PxPl1 +PxPl2 +PxPl3 +Px + +Comp +Superl +Attr +Ord +Qst +IV +TV +Prt +Prs +Ind +Pot +Cond +Imprt +ImprtII +Sg1 +Sg2 +Sg3 +Du1 +Du2 +Du3 +Pl1 +Pl2 +Pl3 +Inf +ConNeg +Neg +PrfPrc +VGen +PrsPrc +Ger +Sup +Actio +VAbess + +Err/Orth + +### Semantic tags + +Sem/Act +Sem/Ani +Sem/Atr +Sem/Body +Sem/Clth +Sem/Domain +Sem/Feat-phys +Sem/Fem +Sem/Group +Sem/Lang +Sem/Mal +Sem/Measr +Sem/Money +Sem/Obj +Sem/Obj-el +Sem/Org +Sem/Perc-emo +Sem/Plc +Sem/Sign +Sem/State-sick +Sem/Sur +Sem/Time +Sem/Txt + +HUMAN + +PROP-ATTR +PROP-SUR + +TIME-N-SET + +### Syntactic tags + +@+FAUXV +@+FMAINV +@-FAUXV +@-FMAINV +@-FSUBJ> +@-F +@-FSPRED +@-F +@-FOPRED> +@>ADVL +@ADVL< +@ +@ADVL +@HAB> +@N +@Interj +@N< +@>A +@P< +@>P +@HNOUN +@INTERJ +@>Num +@Pron< +@>Pron +@Num< +@OBJ +@ +@OPRED +@ +@PCLE +@COMP-CS< +@SPRED +@ +@SUBJ +@ +SUBJ +SPRED +OPRED +@PPRED +@APP +@APP-N< +@APP-Pron< +@APP>Pron +@APP-Num< +@APP-ADVL< +@VOC +@CVP +@CNP +OBJ + +-OTHERS +SYN-V +@X + +## Sets containing sets of lists and tags + +This part of the file lists a large number of sets based partly upon the tags defined above, and +partly upon lexemes drawn from the lexicon. +See the sourcefile itself to inspect the sets, what follows here is an overview of the set types. + +### Sets for Single-word sets + +INITIAL + +### Sets for word or not + +WORD +NOT-COMMA + +### Case sets + +ADLVCASE + +CASE-AGREEMENT +CASE + +NOT-NOM +NOT-GEN +NOT-ACC + +### Verb sets + +NOT-V + +### Sets for finiteness and mood + +REAL-NEG + +MOOD-V + +NOT-PRFPRC + +### Sets for person + +SG1-V +SG2-V +SG3-V +DU1-V +DU2-V +DU3-V +PL1-V +PL2-V +PL3-V + +### Pronoun sets + +### Adjectival sets and their complements + +### Adverbial sets and their complements + +### Sets of elements with common syntactic behaviour + +### NP sets defined according to their morphosyntactic features + +### The PRE-NP-HEAD family of sets + +These sets model noun phrases (NPs). The idea is to first define whatever can +occur in front of the head of the NP, and thereafter negate that with the +expression **WORD - premodifiers**. + +### Border sets and their complements + +### Grammarchecker sets + +* * * + +This (part of) documentation was generated from [tools/grammarcheckers/grammarchecker.cg3](https://github.com/giellalt/lang-tkl/blob/main/tools/grammarcheckers/grammarchecker.cg3) + +--- + +# tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.md + +# Tokeniser for tkl + +Usage: +``` +$ make +$ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +$ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +$ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +$ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +``` + +Pmatch documentation: + + +Characters which have analyses in the lexicon, but can appear without spaces +before/after, that is, with no context conditions, and adjacent to words: +* Punct contains ASCII punctuation marks +* The symbol after m-dash is soft-hyphen `U+00AD` +* The symbol following {•} is byte-order-mark / zero-width no-break space +`U+FEFF`. + +Whitespace contains ASCII white space and +the List contains some unicode white space characters +* En Quad U+2000 to Zero-Width Joiner U+200d' +* Narrow No-Break Space U+202F +* Medium Mathematical Space U+205F +* Word joiner U+2060 + +Apart from what's in our morphology, there are +1. unknown word-like forms, and +2. unmatched strings +We want to give 1) a match, but let 2) be treated specially by +`hfst-tokenise -a` +Unknowns are made of: +* lower-case ASCII +* upper-case ASCII +* select extended latin symbols +ASCII digits +* select symbols +* Combining diacritics as individual symbols, +* various symbols from Private area (probably Microsoft), +so far: +* U+F0B7 for "x in box" + +## Unknown handling +Unknowns are tagged ?? and treated specially with `hfst-tokenise` +hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and +remove empty analyses from other readings. Empty readings are also +legal in CG, they get a default baseform equal to the wordform, but +no tag to check, so it's safer to let hfst-tokenise handle them. + +Finally we mark as a token any sequence making up a: +* known word in context +* unknown (OOV) token in context +* sequence of word and punctuation +* URL in context + +* * * + +This (part of) documentation was generated from [tools/tokenisers/tokeniser-disamb-gt-desc.pmscript](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript) + +--- + +# tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.md + +# Grammar checker tokenisation for tkl + +Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) +Then just: +``` +$ make +$ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +``` + +More usage examples: +``` +$ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +$ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +$ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +``` + +Pmatch documentation: + + +Characters which have analyses in the lexicon, but can appear without spaces +before/after, that is, with no context conditions, and adjacent to words: +* Punct contains ASCII punctuation marks +* The symbol after m-dash is soft-hyphen `U+00AD` +* The symbol following {•} is byte-order-mark / zero-width no-break space +`U+FEFF`. + +Whitespace contains ASCII white space and +the List contains some unicode white space characters +* En Quad U+2000 to Zero-Width Joiner U+200d' +* Narrow No-Break Space U+202F +* Medium Mathematical Space U+205F +* Word joiner U+2060 + +Apart from what's in our morphology, there are +1) unknown word-like forms, and +2) unmatched strings +We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a +* select extended latin symbols +* select symbols +* various symbols from Private area (probably Microsoft), +so far: +* U+F0B7 for "x in box" + +TODO: Could use something like this, but built-in's don't include šžđčŋ: + +Simply give an empty reading when something is unknown: +hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and +remove empty analyses from other readings. Empty readings are also +legal in CG, they get a default baseform equal to the wordform, but +no tag to check, so it's safer to let hfst-tokenise handle them. + +Finally we mark as a token any sequence making up a: +* known word in context +* unknown (OOV) token in context +* sequence of word and punctuation +* URL in context + +* * * + +This (part of) documentation was generated from [tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript) + +--- + +# tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.md + +# TTS tokenisation for smj + +Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) +Then just: +```sh +make +echo "ja, ja" \ +| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +``` + +More usage examples: +```sh +echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \ +boasttu olmmoš, man mielde lahtuid." \ +| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \ +| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +echo "márffibiillagáffe" \ +| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +``` + +Pmatch documentation: + + +Characters which have analyses in the lexicon, but can appear without spaces +before/after, that is, with no context conditions, and adjacent to words: +* Punct contains ASCII punctuation marks +* The symbol after m-dash is soft-hyphen `U+00AD` +* The symbol following {•} is byte-order-mark / zero-width no-break space +`U+FEFF`. + +Whitespace contains ASCII white space and +the List contains some unicode white space characters +* En Quad U+2000 to Zero-Width Joiner U+200d' +* Narrow No-Break Space U+202F +* Medium Mathematical Space U+205F +* Word joiner U+2060 + +Apart from what's in our morphology, there are +1) unknown word-like forms, and +2) unmatched strings +We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a +* select extended latin symbols +* select symbols +* various symbols from Private area (probably Microsoft), +so far: +* U+F0B7 for "x in box" + +TODO: Could use something like this, but built-in's don't include šžđčŋ: + +Simply give an empty reading when something is unknown: +hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and +remove empty analyses from other readings. Empty readings are also +legal in CG, they get a default baseform equal to the wordform, but +no tag to check, so it's safer to let hfst-tokenise handle them. + +Needs hfst-tokenise to output things differently depending on the tag they get + +* * * + +This (part of) documentation was generated from [tools/tokenisers/tokeniser-tts-cggt-desc.pmscript](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-tts-cggt-desc.pmscript) diff --git a/tools-grammarcheckers-grammarchecker.cg3.md b/tools-grammarcheckers-grammarchecker.cg3.md new file mode 100644 index 0000000..ed00c64 --- /dev/null +++ b/tools-grammarcheckers-grammarchecker.cg3.md @@ -0,0 +1,299 @@ + +# Tokelauan G R A M M A R C H E C K E R + +# DELIMITERS + +# TAGS AND SETS + +## Tags + +This section lists all the tags inherited from the fst, and used as tags +in the syntactic analysis. The next section, **Sets**, contains sets defined +on the basis of the tags listed here, those set names are not visible in the output. + +### Beginning and end of sentence +BOS +EOS + +### Parts of speech tags + +N +A +Adv +V +Pron +CS +CC +CC-CS +Po +Pr +Pcle +Num +Interj +ABBR +ACR +CLB +LEFT +RIGHT +WEB +PPUNCT +PUNCT + +COMMA +¶ + +### Tags for POS sub-categories + +Pers +Dem +Interr +Indef +Recipr +Refl +Rel +Coll +NomAg +Prop +Allegro +Arab +Romertall + +### Tags for morphosyntactic properties + +Nom +Acc +Gen +Ill +Loc +Com +Ess +Ess +Sg +Du +Pl +Cmp/SplitR +Cmp/SgNom Cmp/SgGen +Cmp/SgGen +PxSg1 +PxSg2 +PxSg3 +PxDu1 +PxDu2 +PxDu3 +PxPl1 +PxPl2 +PxPl3 +Px + +Comp +Superl +Attr +Ord +Qst +IV +TV +Prt +Prs +Ind +Pot +Cond +Imprt +ImprtII +Sg1 +Sg2 +Sg3 +Du1 +Du2 +Du3 +Pl1 +Pl2 +Pl3 +Inf +ConNeg +Neg +PrfPrc +VGen +PrsPrc +Ger +Sup +Actio +VAbess + +Err/Orth + +### Semantic tags + +Sem/Act +Sem/Ani +Sem/Atr +Sem/Body +Sem/Clth +Sem/Domain +Sem/Feat-phys +Sem/Fem +Sem/Group +Sem/Lang +Sem/Mal +Sem/Measr +Sem/Money +Sem/Obj +Sem/Obj-el +Sem/Org +Sem/Perc-emo +Sem/Plc +Sem/Sign +Sem/State-sick +Sem/Sur +Sem/Time +Sem/Txt + +HUMAN + +PROP-ATTR +PROP-SUR + +TIME-N-SET + +### Syntactic tags + +@+FAUXV +@+FMAINV +@-FAUXV +@-FMAINV +@-FSUBJ> +@-F +@-FSPRED +@-F +@-FOPRED> +@>ADVL +@ADVL< +@ +@ADVL +@HAB> +@N +@Interj +@N< +@>A +@P< +@>P +@HNOUN +@INTERJ +@>Num +@Pron< +@>Pron +@Num< +@OBJ +@ +@OPRED +@ +@PCLE +@COMP-CS< +@SPRED +@ +@SUBJ +@ +SUBJ +SPRED +OPRED +@PPRED +@APP +@APP-N< +@APP-Pron< +@APP>Pron +@APP-Num< +@APP-ADVL< +@VOC +@CVP +@CNP +OBJ + +-OTHERS +SYN-V +@X + +## Sets containing sets of lists and tags + +This part of the file lists a large number of sets based partly upon the tags defined above, and +partly upon lexemes drawn from the lexicon. +See the sourcefile itself to inspect the sets, what follows here is an overview of the set types. + +### Sets for Single-word sets + +INITIAL + +### Sets for word or not + +WORD +NOT-COMMA + +### Case sets + +ADLVCASE + +CASE-AGREEMENT +CASE + +NOT-NOM +NOT-GEN +NOT-ACC + +### Verb sets + +NOT-V + +### Sets for finiteness and mood + +REAL-NEG + +MOOD-V + +NOT-PRFPRC + +### Sets for person + +SG1-V +SG2-V +SG3-V +DU1-V +DU2-V +DU3-V +PL1-V +PL2-V +PL3-V + +### Pronoun sets + +### Adjectival sets and their complements + +### Adverbial sets and their complements + +### Sets of elements with common syntactic behaviour + +### NP sets defined according to their morphosyntactic features + +### The PRE-NP-HEAD family of sets + +These sets model noun phrases (NPs). The idea is to first define whatever can +occur in front of the head of the NP, and thereafter negate that with the +expression **WORD - premodifiers**. + +### Border sets and their complements + +### Grammarchecker sets + +* * * + +This (part of) documentation was generated from [tools/grammarcheckers/grammarchecker.cg3](https://github.com/giellalt/lang-tkl/blob/main/tools/grammarcheckers/grammarchecker.cg3) diff --git a/tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.md b/tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.md new file mode 100644 index 0000000..ee3bc69 --- /dev/null +++ b/tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.md @@ -0,0 +1,60 @@ +# Tokeniser for tkl + +Usage: +``` +$ make +$ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +$ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +$ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +$ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +``` + +Pmatch documentation: + + +Characters which have analyses in the lexicon, but can appear without spaces +before/after, that is, with no context conditions, and adjacent to words: +* Punct contains ASCII punctuation marks +* The symbol after m-dash is soft-hyphen `U+00AD` +* The symbol following {•} is byte-order-mark / zero-width no-break space +`U+FEFF`. + +Whitespace contains ASCII white space and +the List contains some unicode white space characters +* En Quad U+2000 to Zero-Width Joiner U+200d' +* Narrow No-Break Space U+202F +* Medium Mathematical Space U+205F +* Word joiner U+2060 + +Apart from what's in our morphology, there are +1. unknown word-like forms, and +2. unmatched strings +We want to give 1) a match, but let 2) be treated specially by +`hfst-tokenise -a` +Unknowns are made of: +* lower-case ASCII +* upper-case ASCII +* select extended latin symbols +ASCII digits +* select symbols +* Combining diacritics as individual symbols, +* various symbols from Private area (probably Microsoft), +so far: +* U+F0B7 for "x in box" + +## Unknown handling +Unknowns are tagged ?? and treated specially with `hfst-tokenise` +hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and +remove empty analyses from other readings. Empty readings are also +legal in CG, they get a default baseform equal to the wordform, but +no tag to check, so it's safer to let hfst-tokenise handle them. + +Finally we mark as a token any sequence making up a: +* known word in context +* unknown (OOV) token in context +* sequence of word and punctuation +* URL in context + +* * * + +This (part of) documentation was generated from [tools/tokenisers/tokeniser-disamb-gt-desc.pmscript](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript) diff --git a/tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.md b/tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.md new file mode 100644 index 0000000..9b8414c --- /dev/null +++ b/tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.md @@ -0,0 +1,60 @@ +# Grammar checker tokenisation for tkl + +Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) +Then just: +``` +$ make +$ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +``` + +More usage examples: +``` +$ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +$ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +$ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +``` + +Pmatch documentation: + + +Characters which have analyses in the lexicon, but can appear without spaces +before/after, that is, with no context conditions, and adjacent to words: +* Punct contains ASCII punctuation marks +* The symbol after m-dash is soft-hyphen `U+00AD` +* The symbol following {•} is byte-order-mark / zero-width no-break space +`U+FEFF`. + +Whitespace contains ASCII white space and +the List contains some unicode white space characters +* En Quad U+2000 to Zero-Width Joiner U+200d' +* Narrow No-Break Space U+202F +* Medium Mathematical Space U+205F +* Word joiner U+2060 + +Apart from what's in our morphology, there are +1) unknown word-like forms, and +2) unmatched strings +We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a +* select extended latin symbols +* select symbols +* various symbols from Private area (probably Microsoft), +so far: +* U+F0B7 for "x in box" + +TODO: Could use something like this, but built-in's don't include šžđčŋ: + +Simply give an empty reading when something is unknown: +hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and +remove empty analyses from other readings. Empty readings are also +legal in CG, they get a default baseform equal to the wordform, but +no tag to check, so it's safer to let hfst-tokenise handle them. + +Finally we mark as a token any sequence making up a: +* known word in context +* unknown (OOV) token in context +* sequence of word and punctuation +* URL in context + +* * * + +This (part of) documentation was generated from [tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript) diff --git a/tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.md b/tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.md new file mode 100644 index 0000000..018f0f2 --- /dev/null +++ b/tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.md @@ -0,0 +1,61 @@ +# TTS tokenisation for smj + +Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) +Then just: +```sh +make +echo "ja, ja" \ +| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +``` + +More usage examples: +```sh +echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \ +boasttu olmmoš, man mielde lahtuid." \ +| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \ +| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +echo "márffibiillagáffe" \ +| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst +``` + +Pmatch documentation: + + +Characters which have analyses in the lexicon, but can appear without spaces +before/after, that is, with no context conditions, and adjacent to words: +* Punct contains ASCII punctuation marks +* The symbol after m-dash is soft-hyphen `U+00AD` +* The symbol following {•} is byte-order-mark / zero-width no-break space +`U+FEFF`. + +Whitespace contains ASCII white space and +the List contains some unicode white space characters +* En Quad U+2000 to Zero-Width Joiner U+200d' +* Narrow No-Break Space U+202F +* Medium Mathematical Space U+205F +* Word joiner U+2060 + +Apart from what's in our morphology, there are +1) unknown word-like forms, and +2) unmatched strings +We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a +* select extended latin symbols +* select symbols +* various symbols from Private area (probably Microsoft), +so far: +* U+F0B7 for "x in box" + +TODO: Could use something like this, but built-in's don't include šžđčŋ: + +Simply give an empty reading when something is unknown: +hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and +remove empty analyses from other readings. Empty readings are also +legal in CG, they get a default baseform equal to the wordform, but +no tag to check, so it's safer to let hfst-tokenise handle them. + +Needs hfst-tokenise to output things differently depending on the tag they get + +* * * + +This (part of) documentation was generated from [tools/tokenisers/tokeniser-tts-cggt-desc.pmscript](https://github.com/giellalt/lang-tkl/blob/main/tools/tokenisers/tokeniser-tts-cggt-desc.pmscript)