#Document Classifier Using Vanilla Recurrent Neural Network
The Dataset consists of Textual Data that belong to one of the 8 Categories
- Tokenize The Sentences
- Remove Tokens That Are Stop Words
- If the number of words in the Document is Greater Than 20 then retain only the First 20 Words of the sentence.
- Convert Every Word to its Unique Identifcation Number
- Now for the sentences with number of words Less Than 20, pad the sentence with 0's
Had to Fix the Sequence Length to 20 and Pad with 0's to make Training Faster
- Learning Rate = 1e-2
- Epochs = 4050
- Word Embedding Dimension = 100
- Hidden State Dimension = 128
- Truncated Backpropagation Length = 4
- Training Sequence Length = 20
- Batch Size = 1000
- Weight Initialization was done from a Gaussian Distribution with mean=0.0 and std=1
- Bias were Zero Initialized
Test Set Accuracy = 74.22%
The training time for the model was about 6 hours
The Model can be downloaded from here