tScrape

A simple Python scraper and parser for Twitter pages using Beautifulsoup 4 and Selenium webdriver. Parser output is json, see details below.

Install requirements

pip install -r requirements.txt

Configure scraper

In run.py:

Define a list of Twitter handles
Set a date for the scraper to go back in time
Define verbosity and output paths

Run the scraper

python run.py

Output

For each Twitter handle the parser will output two json files:

handle-stats.json with the keys:

name
url
bioText
following
frequency: tweets per day
followers
location:
favorites: average received favorites per tweet
tweets: total number of tweets
joinDate
retweets: average received retweets per tweet

handle-tweets.json with an entry for each tweet containing the keys:

origin: tweet (T) or retweet (RT)?
text: full text including hashtags, mentions and links
hashtags: a list of used hashtags
retweets: number of received retweets
favorites: number of received favorites
time: time of tweet
mentions: list of @ mentions

Importing results

To import the results as a Python dictionary use the jsonLoad function:

from tScrape import jsonLoad
results = jsonLoad(handle, path)

Use the standard dictionary methods to inspect the content:

results.keys()
results.values()

for key en results.keys():
    print key, results[key]

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
tScrape.py		tScrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tScrape

Install requirements

Configure scraper

Run the scraper

Output

Importing results

About

Releases

Packages

Languages

License

tranberg/tScrape

Folders and files

Latest commit

History

Repository files navigation

tScrape

Install requirements

Configure scraper

Run the scraper

Output

Importing results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages