Language Detection Model

Data set : https://www.kaggle.com/datasets/basilb2s/language-detection

Steps Followed

1. Load the data set

2. Encode the labels into categoical form

3. Pre-process the Text content

4. Tokenizing

5. Create a Dictionary for Vocabulary

6. Count the Word Frequencies (Unigrams were considered)

7. Split the dataset into train and test sets

8. Perform Supervised Classification

Could achieve 97.3% accuracy using Naive Bayes Classifier