tweets.2016-09-01 dataset #26

rezwanh001 · 2019-12-01T15:51:29Z

""" Creates a vocabulary from a tsv file.
"""

import codecs
import example_helper
from torchmoji.create_vocab import VocabBuilder
from torchmoji.word_generator import TweetWordGenerator

with codecs.open('../../twitterdata/tweets.2016-09-01', 'rU', 'utf-8') as stream:
    wg = TweetWordGenerator(stream)
    vb = VocabBuilder(wg)
    vb.count_all_words()
    vb.save_vocab()

In this code, in oder to create a vocabulary, you had been used '../../twitterdata/tweets.2016-09-01'
dataset. But where I will find this dataset? Please let me know.
Please share this dataset with my mail [email protected], if it is possible.

The text was updated successfully, but these errors were encountered:

KingS770234358 · 2020-01-02T08:24:11Z

Hello,have you solved this problem?

rezwanh001 · 2020-01-02T16:09:32Z

@KingS770234358 , This issue is not solved yet.

KingS770234358 · 2020-01-02T16:16:10Z

@rezwanh001 as the huggingface mentioned in the readme file,the code in the 'script' folder are used to process the raw data in the folder ‘data'. I think 'tweets.2016-09-01' may be the result of processing.

KingS770234358 · 2020-01-02T16:20:55Z

Maybe you should run the script 'convert_all_datasets.py' in the 'script' folder.

anuragvij264 · 2020-02-16T09:29:49Z

@KingS770234358 I tried running that script. Ran into this error.

Converting Olympic
-- Generating ../data/Olympic/own_vocab.pickle 
     done. Coverage: 0.030899113550021062
-- Generating ../data/Olympic/twitter_vocab.pickle 
     done. Coverage: 0.8874630645842128
-- Generating ../data/Olympic/combined_vocab.pickle 
Traceback (most recent call last):
  File "/Users/avij1/Desktop/imp_shit/torchMoji/scripts/convert_all_datasets.py", line 88, in <module>
    data = pickle.load(dataset, fix_imports=True,encoding='utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 6: invalid continuation byte
     done. Coverage: 0.8874630645842128
Converting PsychExp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tweets.2016-09-01 dataset #26

tweets.2016-09-01 dataset #26

rezwanh001 commented Dec 1, 2019

KingS770234358 commented Jan 2, 2020

rezwanh001 commented Jan 2, 2020

KingS770234358 commented Jan 2, 2020

KingS770234358 commented Jan 2, 2020

anuragvij264 commented Feb 16, 2020

tweets.2016-09-01 dataset #26

tweets.2016-09-01 dataset #26

Comments

rezwanh001 commented Dec 1, 2019

KingS770234358 commented Jan 2, 2020

rezwanh001 commented Jan 2, 2020

KingS770234358 commented Jan 2, 2020

KingS770234358 commented Jan 2, 2020

anuragvij264 commented Feb 16, 2020