Artai Spell Checker is a simple spell checker implemented in Python. It uses a combination of Damerau-Levenshtein distance and N-gram models to check and suggest corrections for misspelled words. It is done as a group course project.
- Campus: Addis Ababa Institue of Technology.
- Department: Software and Information Technology Engineering
- Course: Fundamentals of Software Engineering I.
- Advisor: Instructor Nuniyat Kifle
Group Members
- Abdulmunim Jundurahman UGR/8625/14
- Bisrat Asaye UGR/8508/14
- Ezana Kifle UGR/4189/14
- Fuad Mohammad UGR/6052/14
- Sifan Fita UGR/8856/14
- Yordanos Zegeye UGR/6316/14
- Damerau-Levenshtein distance
- N-gram model for contextualization
- Python 3.x
- Required dependencies (specified in
requirements.txt
)
pip install -r requirements.txt
- Clone this repository
git clone https://github.com/abdulmunimjemal/ArtaiSpellChecker.git
cd ArtaiSpellChecker
- Run the spell checker
cd src
python main.py
-
artai_spell_checker/
- src/
__init__.py
: Initialization file for thespell_checker
module.spell_checker.py
: Implementation of theSpellChecker
class.damerau_levenshtein.py
: Implementation of the Damerau-Levenshtein distance.ngram_model.py
: Implementation of N-gram model functions.train.py
: Training and usage of a bigram language model on the corpus for suggestion ranking.- utils/
__init__.py
: Initialization file for theutils
module.file_reader.py
: Implementation of file reading functions.amharic_tokenizer.py
: Implementation of an Amharic tokenizer.amharic_normalizer.py
: Implementation of an Amharic normalizer.amharic_dictionart.py
: Implementation of an Amharic Dictionary for Word Lookup.
main.py
: Main script for running the spell checker.user_interface.py
: (Optional) Implementation of a user interface.preprocessing.py
: Implementation of Amahric text preprocessing functions.
- src/
-
data/
amharic_dictionary.txt
: Dictionary file with Amharic words.amharic_corpus.txt
: Corpus file with Amharic sentences.
-
LICENSE
: Project license file. -
requirements.txt
: List of project dependencies. -
README.md
: Project documentation.