PostgreSQL Database version of SCOWL #306

kevina · 2021-01-15T06:21:37Z

Over the years I have been working to convert SCOWL from a collection of word lists combined with sort, uniq, comm, some Perl scripts, and an extremely complicated Makefile into a true SQL database.

In the new database words are broken down into different senses. For example there is a separate entry for a noun and the verb form of a word. There may also be separate entries for different meanings of a word within the same part of speech (POS). In addition inflected forms of a word are linked with there root word, for example "running" will be linked with the root "run". Variants of a word are linked with a particular sense of a word rather than the word itself to handle the many corner cases when the correct spelling depends on the meaning or POS.

The processing of the source word lists and other information (such as VarCon) into individual entries is vastly more complicated than what is currently used to create SCOWL. However, it is also what I hope to be more maintainable in the long term, as not even I fully understand the vastly complicated Makefile that drives SCOWL.

The new database contains the same information as SCOWL but the resulting wordlists are slightly different due to vastly different processing.

It is almost ready but I am unsure what direction I am going to take it. In particular due to the tremendous amount of work that went into it I am not sure if I am going to release everything at once unless I get some sort of funding.

I am very interested in some early feedback and if what I have will be useful, beyond a better way to generated high quality word lists.

If you are interested in a preview version of the database please email me directly (you can find my email on my GitHub profile) with your GitHub user id. I will verify that your account is linked with a real person who has at least some presence on the web and will grant you access to the repo that contains the preview version when it is ready. If your GitHub profile is empty or there is something suspicious about your account I may ignore you.

This is an announcement, I am locking this issue to discourage people from requesting access by replying to this issue and not emailing me. Please leave general feedback you wish to make public on issue #307.

[Draft, see comment modification history for revision date]

The text was updated successfully, but these errors were encountered:

kevina · 2021-01-23T20:32:34Z

One of the advantages to the SQL Database version of SCOWL is the ability to create word lists with special symbols in them, including compound words with a space or hyphen. As a preview of what is possible check out the enhanced version of the custom word list creator at https://devel.kevina.org/cgi-bin/create, the password is scowldb2.

Also, to get idea of the information stored on a word check out the SCOWL DB. lookup tool at https://devel.kevina.org/cgi-bin/lookup.

Both these tools are experimental and subject to change or be removed. They are also hosted on a temporary sever and may go down from time to time. If you wish to use them and they are not available please email me directly.

kevina pinned this issue Jan 15, 2021

kevina mentioned this issue Jan 15, 2021

Syntax errors in varcon.txt, miscellaneous typos in README #304

Closed

en-wl locked and limited conversation to collaborators Jan 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL Database version of SCOWL #306

PostgreSQL Database version of SCOWL #306

kevina commented Jan 15, 2021 •

edited

Loading

kevina commented Jan 23, 2021 •

edited

Loading

PostgreSQL Database version of SCOWL #306

PostgreSQL Database version of SCOWL #306

Comments

kevina commented Jan 15, 2021 • edited Loading

kevina commented Jan 23, 2021 • edited Loading

kevina commented Jan 15, 2021 •

edited

Loading

kevina commented Jan 23, 2021 •

edited

Loading