Skip to content

Commit

Permalink
documenting transcriptors
Browse files Browse the repository at this point in the history
  • Loading branch information
Trondtr committed Aug 31, 2023
1 parent 8511e4d commit f35f7e9
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 1 deletion.
3 changes: 2 additions & 1 deletion index.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,9 @@
- [Transducers](infra/Infrastructure.md)
- [Keyboards](keyboards/Overview.md)
- [Spelling checkers](proof/index.md)
- [Hyphenators](proof/hyph/index.md)
- [Grammar checkers](proof/gramcheck/GrammarCheckerDocumentation.md)
- [Hyphenators](proof/hyph/index.md)
- [Transcription](transcriptions/index.md)
- [Dictionaries](dicts/dicts.md)
- [Corpus work](ling/corpusindex.md)
- [ICALL](https://giellalt.uit.no/ped/index.html) <!-- (ped/index.md) -->
Expand Down
71 changes: 71 additions & 0 deletions transcriptions/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
Transcriptors
=============

The infrastructure has several FSTs for transcribing from one text string to another, uds

# Transcriptors

## Overview


The folder `lang-xxx/src/phonetics` contains setup for various number and symbol representations to their text representation. The source files in the catalogue are:

```
transcriptor-abbrevs2text.lexc # for abbreviations
transcriptor-clock-digit2text.lexc # for time expressions
transcriptor-date-digit2text.lexc # for dates
transcriptor-numbers-digit2text.lexc # for cardinals and ordinals
```

Each `lexc`file gives rise to two transducers, here with `clock` as example:

´´´
transcriptor-clock-digit2text.filtered.lookup.hfstol
transcriptor-clock-digit2text.lexc
transcriptor-clock-text2digit.filtered.lookup.hfstol
´´´

The direction (from digit to text or vice versa) is shown in the filename.

## Development

## Testing

### Commands

Here are some resources for testing the transcriptors. You may generate the first 100, or .. numbers as follows:

` yes |head -100|cat -n|cut -c-6|tr -d " "|hfst-lookup src/transcriptions/transcriptor-numbers-digit2text.filtered.lookup.hfstol`

This comment has been minimized.

Copy link
@flammie

flammie Sep 1, 2023

Contributor

seq 1 100 :-)


Then you may check them up against the fst:

`yes |head -100|cat -n|cut -c-6|tr -d " "|hfst-lookup src/transcriptions/transcriptor-numbers-digit2text.filtered.lookup.hfstol |cut -f2|cut -c1-|grep -v '^$'|husma`


### Documents for testing

We have ready-made files for all numeral formats:

$GTHOME/ped/doc/common/numratesting/cardinal
$GTHOME/ped/doc/common/numratesting/clock
$GTHOME/ped/doc/common/numratesting/date
$GTHOME/ped/doc/common/numratesting/ordinal
```
You may thus test with these files (here with `clock` as example):
`cat $GTHOME/ped/doc/common/numratesting/clock |hfst-lookup src/transcriptions/transcriptor-clock-digit2text.filtered.lookup.hfstol`
(If you don't have GTHOME, the files are [here](https://gtsvn.uit.no/langtech/trunk/ped/doc/common/numratesting/)
# Phonetics
The folder `lang-xxx/src/phonetics` contains setup for text-to-IPA transcription.
# Spell relax
The folder `lang-xxx/src/orthography` contains files for translating sloppy writing and non-standard encoding to standard forms.

0 comments on commit f35f7e9

Please sign in to comment.