You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wasn't sure which was the best forum to post this issue/question to - the yahoo groups or hear. It seems issues have more activity than in the groups. (I've cross posted: http://tech.groups.yahoo.com/group/y_lda/message/15)
I'm a total newbie to LDA, so please forgive me if I don't quite formulate this
question concisely.
From the single machine instructions for "Using the Model"
(/Yahoo_LDA/docs/html/single__machine__usage.html#using_model) it indicates that
you can run in either batch OR streaming mode.
In batch mode, the output are several files: lda.docToTop.txt lda.topToWor.txt
lda.worToTop.txt
lda.docToTop.txt is what I like - document - topic assignments.
e.g. www.sauritchsurfboards.com/ recreation/sports/aquatic_sports (65,0.138889)
(54,0.111111) (9,0.0833333) (21,0.0833333) (27,0.0833333) (87,0.0833333)
(29,0.0555556) (52,0.0555556) (56,0.0555556) (72,0.0555556)
However, in streaming mode, it seems to be returning to me document word to
topic assignments similar to batch mode's lda.worToTop.txt.
e.g. www.sauritchsurfboards.com/ recreation/sports/aquatic_sports (watch,87)
(past,87) (months,72) (noticed,21) (guy,52) (surf,27) (magazine,87)
(published,10) (finally,21) (run,21) (copyright,54) (surfboards,27) (rights,54)
(reserved,54) (june,72) (launches,73) (improved,9) (site,54) (order,73)
(custom,56) (surfboards,27) (online,52) (improvements,9) (top,9) (selling,6)
(models,29) (middot,65) (rocket,44) (fish,56) (middot,65) (speed,65) (egg,95)
(middot,65) (classic,29) (middot,65) (squash,55)
Can I make streaming mode return doc - topic assignments?
If not, can I compute the doc-topic assignments easily from the doc word - topic
assignment output?
I would like to call the streaming mode from a Java process.
Please help. :)
Thanks!
-John
The text was updated successfully, but these errors were encountered:
I found the logic in the batch mode that reports doc-topic:
void Unigram_Model_Training_Builder::create_output()
Basically doc topic assignments are computed from word-topic assignments using a score ratio of the total count of each topic in topic-word divided by total number of words:
topicCount / totalNumWordsInDoc
The logic responsible for returning results in the stream mode is
void Unigram_Model_Streamer::write(void* token)
I added the logic from create_output() to the streamer::write() method and now it returns [doc-topic,score] [doc-topic,score] ... || (word,topic) (word,topic) ...
Hello,
I wasn't sure which was the best forum to post this issue/question to - the yahoo groups or hear. It seems issues have more activity than in the groups. (I've cross posted: http://tech.groups.yahoo.com/group/y_lda/message/15)
I'm a total newbie to LDA, so please forgive me if I don't quite formulate this
question concisely.
From the single machine instructions for "Using the Model"
(/Yahoo_LDA/docs/html/single__machine__usage.html#using_model) it indicates that
you can run in either batch OR streaming mode.
In batch mode, the output are several files: lda.docToTop.txt lda.topToWor.txt
lda.worToTop.txt
lda.docToTop.txt is what I like - document - topic assignments.
e.g.
www.sauritchsurfboards.com/ recreation/sports/aquatic_sports (65,0.138889)
(54,0.111111) (9,0.0833333) (21,0.0833333) (27,0.0833333) (87,0.0833333)
(29,0.0555556) (52,0.0555556) (56,0.0555556) (72,0.0555556)
However, in streaming mode, it seems to be returning to me document word to
topic assignments similar to batch mode's lda.worToTop.txt.
e.g.
www.sauritchsurfboards.com/ recreation/sports/aquatic_sports (watch,87)
(past,87) (months,72) (noticed,21) (guy,52) (surf,27) (magazine,87)
(published,10) (finally,21) (run,21) (copyright,54) (surfboards,27) (rights,54)
(reserved,54) (june,72) (launches,73) (improved,9) (site,54) (order,73)
(custom,56) (surfboards,27) (online,52) (improvements,9) (top,9) (selling,6)
(models,29) (middot,65) (rocket,44) (fish,56) (middot,65) (speed,65) (egg,95)
(middot,65) (classic,29) (middot,65) (squash,55)
Can I make streaming mode return doc - topic assignments?
If not, can I compute the doc-topic assignments easily from the doc word - topic
assignment output?
I would like to call the streaming mode from a Java process.
Please help. :)
Thanks!
-John
The text was updated successfully, but these errors were encountered: