Skip to content

ldtalent/reddit-twitter-convo-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Reddit Conversation Data Scraping using BeautifulSoup & Colaboratory

Simple tools to scrape Reddit while maintaining all of their contextual data, such as the convrsational tree, vote count, subreddit, username, etc. Utilizing Colaboratory & Google Drive both to save computational resources & space, easily integrated for further usage such as machine learning, data analyzing, etc.

Complete explanation can be found through the notebook. Meanwhile detailed explanation is provided in the LearningDollars blog.

Setup

No special setup needed as the whole process done in Colaboratory. For local usage, you may clone the repository & run it as if you run any other Python Notebook.

Options

The main options are the ones provided in the hyperparameters block.

  • PAGES - Number of pages from each subreddit to be scraped.
  • SUBREDDIT - List of subreddit.
  • SORT - Sorting type, set to hot, new, rising, controversial, top, or gilded.
  • TIME - Sorting timespan, set to hour, week, month, year, or all.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published