Project

Crawl the content and comments of tweets based on tweet API.

Environment

Mac OS/Linux
Chrome (others may be also fine)
Chromedriver (may need to be redownloaded from here depending on the version of your browser)

Setup

Please fill in your api key of tweet in pipeline.py:Line7-10.
If you are not using chrome, please change the path of your browser driver in twitter_conversation_crawler/spiders/*.py.

Crawl Images from Single Tweet

scrapy crawl twitter_single -a tweet_id=1273569944081510400

You can modify get_single_tweet function in ./twitter_conversation_crawler/spiders/twitter_single.py to change the save path.

Crawl All Content and Comments from Single Tweet Recursively

It crawls contens from a tweet recursively with its comments. The content includes text and image if the post or the comment has. You can add more ids in twitter_ids.txt. The results are in the result folder.

python pipeline.py

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
twitter_conversation_crawler		twitter_conversation_crawler
.gitignore		.gitignore
README.md		README.md
chromedriver		chromedriver
pipeline.py		pipeline.py
scrapy.cfg		scrapy.cfg
twitter_ids.txt		twitter_ids.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project

Environment

Setup

Crawl Images from Single Tweet

Crawl All Content and Comments from Single Tweet Recursively

About

Releases

Packages

Languages

arxrean/TwitterCrawlWithComments

Folders and files

Latest commit

History

Repository files navigation

Project

Environment

Setup

Crawl Images from Single Tweet

Crawl All Content and Comments from Single Tweet Recursively

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages