Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL Database version of SCOWL #306

Open
kevina opened this issue Jan 15, 2021 · 1 comment
Open

PostgreSQL Database version of SCOWL #306

kevina opened this issue Jan 15, 2021 · 1 comment

Comments

@kevina
Copy link
Member

kevina commented Jan 15, 2021

Over the years I have been working to convert SCOWL from a collection of word lists combined with sort, uniq, comm, some Perl scripts, and an extremely complicated Makefile into a true SQL database.

In the new database words are broken down into different senses. For example there is a separate entry for a noun and the verb form of a word. There may also be separate entries for different meanings of a word within the same part of speech (POS). In addition inflected forms of a word are linked with there root word, for example "running" will be linked with the root "run". Variants of a word are linked with a particular sense of a word rather than the word itself to handle the many corner cases when the correct spelling depends on the meaning or POS.

The processing of the source word lists and other information (such as VarCon) into individual entries is vastly more complicated than what is currently used to create SCOWL. However, it is also what I hope to be more maintainable in the long term, as not even I fully understand the vastly complicated Makefile that drives SCOWL.

The new database contains the same information as SCOWL but the resulting wordlists are slightly different due to vastly different processing.

It is almost ready but I am unsure what direction I am going to take it. In particular due to the tremendous amount of work that went into it I am not sure if I am going to release everything at once unless I get some sort of funding.

I am very interested in some early feedback and if what I have will be useful, beyond a better way to generated high quality word lists.

If you are interested in a preview version of the database please email me directly (you can find my email on my GitHub profile) with your GitHub user id. I will verify that your account is linked with a real person who has at least some presence on the web and will grant you access to the repo that contains the preview version when it is ready. If your GitHub profile is empty or there is something suspicious about your account I may ignore you.

This is an announcement, I am locking this issue to discourage people from requesting access by replying to this issue and not emailing me. Please leave general feedback you wish to make public on issue #307.

[Draft, see comment modification history for revision date]

@kevina kevina pinned this issue Jan 15, 2021
@en-wl en-wl locked and limited conversation to collaborators Jan 15, 2021
@kevina
Copy link
Member Author

kevina commented Jan 23, 2021

One of the advantages to the SQL Database version of SCOWL is the ability to create word lists with special symbols in them, including compound words with a space or hyphen. As a preview of what is possible check out the enhanced version of the custom word list creator at https://devel.kevina.org/cgi-bin/create, the password is scowldb2.

Also, to get idea of the information stored on a word check out the SCOWL DB. lookup tool at https://devel.kevina.org/cgi-bin/lookup.

Both these tools are experimental and subject to change or be removed. They are also hosted on a temporary sever and may go down from time to time. If you wish to use them and they are not available please email me directly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant