Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic check for spelling consistency? #319

Closed
hjoliver opened this issue Nov 16, 2021 · 5 comments
Closed

automatic check for spelling consistency? #319

hjoliver opened this issue Nov 16, 2021 · 5 comments
Milestone

Comments

@hjoliver
Copy link
Member

hjoliver commented Nov 16, 2021

For terms such as:

  • datetime vs date-time vs date time
  • inter-cycle vs intercycle
  • cycle-point vs cyclepoint
  • etc.

Sphinx extension, or post commit hook, or simple check script in CI ?

@hjoliver hjoliver added the question Further information is requested label Nov 16, 2021
@hjoliver hjoliver added this to the 8.0.0 milestone Nov 16, 2021
@oliver-sanders
Copy link
Member

https://github.com/sphinx-contrib/spelling

@hjoliver hjoliver modified the milestones: 8.0rc1, 8.x Jan 27, 2022
@wxtim
Copy link
Member

wxtim commented Feb 15, 2022

Recently I tried to implement this and found it a bit tricky - sphinx-contrib/spelling isn't very well supported. Worse its lexers don't recognize apostrophe contractions so you have to exclude "don", "aren" &c &c.

One possibility is to just use it anyway and use sed or similar to extract all the problem words and add them to one or more sphixcotrib-spellings custom dictionaries (which we can configure). This will at least catch new errors.

@hjoliver
Copy link
Member Author

Worse its lexers don't recognize apostrophe contractions so you have to exclude "don", "aren" &c &c.

Yikes, that is indeed "worse".

@wxtim
Copy link
Member

wxtim commented Feb 16, 2022

RE: Just using sphinxcontrib-spelling
I tried opening an issue sphinx-contrib/spelling#146 on sphinxcontrib spelling - but it turns out that the problem is with the upstream pyenchant's tokenizer (see sphinx-contrib/spelling#126).
If we did decided to live with the issues, one possible solution is to have separate dictionaries to allow us to better track why words are allowed: I'd propose

  • spelling/dictionary.txt - words we definitely want to add
  • spelling/not-real-words.txt - Definitely not real words, but we want them anyway: e.g. "foo", "bar" &c
  • spelling/spellcheck.txt - "words" where there is some sort of lexer problem
  • spelling/TODO.txt - words which are contentious in some way or where spelling should be standardised at some point in the future - examples including any word ending ise/ize (which ought to be consistent) and Parameterize and Parametrize, which as far as I can tell are merely alternate spellings.

Another option is to write an entirely separate check using Ispell, Aspell, Enchant or similar. I couldn't do it in the 15 minutes I spent trying, which hardly rules it out. I aslo tried Enchant which didn't seem to have the same tokenizer issue as Pyenchant!

@oliver-sanders
Copy link
Member

Closed by #501

@oliver-sanders oliver-sanders modified the milestones: pending, 8.0.1 Aug 15, 2022
@oliver-sanders oliver-sanders removed the question Further information is requested label Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants