Reddit Conversation Data Scraping using BeautifulSoup & Colaboratory

Simple tools to scrape Reddit while maintaining all of their contextual data, such as the convrsational tree, vote count, subreddit, username, etc. Utilizing Colaboratory & Google Drive both to save computational resources & space, easily integrated for further usage such as machine learning, data analyzing, etc.

Complete explanation can be found through the notebook. Meanwhile detailed explanation is provided in the LearningDollars blog.

Setup

No special setup needed as the whole process done in Colaboratory. For local usage, you may clone the repository & run it as if you run any other Python Notebook.

Options

The main options are the ones provided in the hyperparameters block.

PAGES - Number of pages from each subreddit to be scraped.
SUBREDDIT - List of subreddit.
SORT - Sorting type, set to hot, new, rising, controversial, top, or gilded.
TIME - Sorting timespan, set to hour, week, month, year, or all.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
Reddit_Scrape.ipynb		Reddit_Scrape.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Conversation Data Scraping using BeautifulSoup & Colaboratory

Setup

Options

About

Releases

Packages

Contributors 2

Languages

ldtalent/reddit-twitter-convo-data

Folders and files

Latest commit

History

Repository files navigation

Reddit Conversation Data Scraping using BeautifulSoup & Colaboratory

Setup

Options

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages