Skip to content

Wiktionary dump file parser and Finnish data extractor

License

Notifications You must be signed in to change notification settings

shiyuwang-jamk/fi-wiktextract

 
 

Repository files navigation

Why this fork

The original repo pulls Finnish data from the domain en.wiktionary.org, which lacks some information from the Finnish counterpart fi.wiktionary.org (hereinafter referred to as fi.wikt).

This fork will be an attempt at pulling data from fi.wikt instead, focusing on collecting exceptions in Finnish verb conjugation from NSK and KOTUS, e.g. the Finnish word sortaa in fi.wikt versus in en.wikt.

TODO

  • Write fi.lua file in languages/lua
  • Write src/wiktextextract/data/fi/config.json copied from ../en
  • Write extractor scripts in src/wiktextextract/extractor/fi
  • Find enough time for all tasks.

TOREAD; known so far

Searching across GitHub

About

Wiktionary dump file parser and Finnish data extractor

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%