Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexicon compiler outputs an unhelpful error message if lexicon contains a syntax error #21

Open
Kaljurand opened this issue Jul 12, 2015 · 2 comments

Comments

@Kaljurand
Copy link
Contributor

Steps to reproduce:

  1. Add the line "sage:0905|" (or another syntactically incorrect entry) somewhere into "dct/data/mrf/fs_lex"
  2. Run nullist-uus-sonastik.sh
  3. After a while the script stops and prints "Ei leia , märki" to the console

It would be better if every (or also just the first) line that contains a syntax error is output along with the line number and lexicon source file name. This would make locating the errors computationally tractable.

While the lexicon source distributed with Vabamorf do not contain any errors, lexicons automatically generated from external resources most likely will (during the development of conversion scripts).

@merisiga
Copy link
Contributor

I think this is a non-issue. A standard, normal way of adding new entries to the lexicon would be the following:

  1. Choose the tüüpsõna (example word?) (kõne) for the one you want to add (ruse).
  2. Take the dictionary entry for kõne and modify it to be suitable for ruse by making some stem changes in the entry.
  3. Add the new entry.
    You see, there is not much chance for syntactic errors here.

@Kaljurand
Copy link
Contributor Author

The use case that I have in mind is somebody developing a converter from a more human readable lexicon format to the Vabamorf lexicon format. It would be good then if the lexicon compiler could provide more detailed feedback. (Ideally there could be an API for adding entries dynamically at runtime, and not from strings but structured objects, but that's another issue.)

But use cases are unpredictable. My main argument is that "Ei leia , märki" is simply a very bad error message. A compiler should say more than just ERROR, especially if it parses line by line so that reporting line numbers is straightforward. So I do find that this is an issue, one that should be resolved in long term at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants