-
Notifications
You must be signed in to change notification settings - Fork 3
Background info Pod #1
This will give a quick breakdown of background info that will be helpful for each project.
How to create the Training Data: Deep learning
patterns (x)-
"Hi" "How are you?"
tag (y) ---> the tags for patterns
greeting
NLP Basics:
Tokenization: splitting a string into meangingful units(diff categories ex. words, numbers, punctuation)
Stemming: Generate the root form of the words. Crude heuristic that chops off the ends of words. ex: "organize", "organizes", "organizing" ----->["organ", "organ", "organ"] OR "University", "Universe" ----->["Univers", "Univers"]
To be able to get an accurate response you'd put the string of data (patterns) into a 'bag of words'. All single words within the string are put into an array. The array size will be the size of all the words.
["Hi", "How", "are", "you"]
"Hi"--->[1, 0, 0, 0] 0(greeting)
X | Y (Y vector)
"Is anyone there?"
|
| tokenize
V
["is", "anyone", "there"]
|
| lower + stem
V
["is", "anyone", "there", "?"]
|
| exclude punctuation char.
V
["is", "anyone", "there"]
|
| bag of words
V
[0,0,0,1,0,1,0,1]
Feed Forward Neural Net:
X -----> Neural Net( # of patterns--->#of classes)--->Softmax--->Y
-Input is the bag of words -First layer of the neural net is 3 of patterns (layer for connected) -then hidden layer, another hidden layer -output layer