-
Notifications
You must be signed in to change notification settings - Fork 86
Featurization
faridani edited this page Jan 24, 2012
·
1 revision
- The code main.m generates a cell array for you and stores it in the "descriptions" variable
- for each cell in descriptions do this
- pass it through commentSanitizer
- initialize global hashmap g = containers.Map()
- for each word in the resulting string do this
- pass the word to porterStemmer
- if the word is not in your hashmap for that cell add it, if it is just add one to its value. Do the same thing for the global hashmap
- Take the values in your g that have value>=n (say 3) and store those keys as a header
- for each hashmap of a cell featurize it using headers (for example if header is {'book','note','I'} a comment like "I love my book" should be [1,0,1]
- store the matrix of the oroginal featurized cell array in a csv file by using csvwrite