Skip to content

Latest commit

 

History

History
15 lines (12 loc) · 518 Bytes

README.MD

File metadata and controls

15 lines (12 loc) · 518 Bytes

Language Detection Model

Data set : https://www.kaggle.com/datasets/basilb2s/language-detection

Steps Followed

1. Load the data set
2. Encode the labels into categoical form
3. Pre-process the Text content
4. Tokenizing
5. Create a Dictionary for Vocabulary
6. Count the Word Frequencies (Unigrams were considered)
7. Split the dataset into train and test sets
8. Perform Supervised Classification

Could achieve 97.3% accuracy using Naive Bayes Classifier