Skip to content

Latest commit

 

History

History
63 lines (47 loc) · 3.37 KB

README.md

File metadata and controls

63 lines (47 loc) · 3.37 KB

Scraping Tested Comic Episodes on Webtoons using Python and Selenium

The scripts output will contain the following columns:

  • Episode Name
  • Date
  • Loves
  • Episode Number
  • Comments Count
  • Comment Username
  • Comment Description
  • Comment Likes
  • Comment Dislikes
  • Reply Username
  • Reply Description
  • Reply Likes
  • Reply Dislikes

How to Run the Script on Windows

Clone the repository to your system as a ZIP File

Screenshot 2021-04-05 133004

Click the arrow on the folder and click "Show in folder"

Screenshot 2021-04-05 132235

Right click the ZIP file and click "Extract All..."

Screenshot 2021-04-05 133642

Input your desired directory to save the folder (you will need this later)

Click "Extract"

Screenshot 2021-04-05 134152

Install Python

Click Here to download Python3.8 (requires Python3.8 or lower)

Click to open the installer

Screenshot 2021-04-05 131101

Check the "Add Python 3.8 to PATH" box then click "Install Now"

Screenshot 2021-04-05 131239

Once the installation is complete, press the "Windows" key and search for Command Prompt by typing "CMD"

Click "Open" to open the Command Prompt

Screenshot 2021-04-05 135248

Type "python --version" and press enter to verify python is installed and in PATH

Screenshot 2021-04-05 135515

Navigate to the extracted folder using the "cd" command: Type "cd C:\YOUR\DIRECTORY\HERE\webtoons-comments-in-python-main" and press enter

Use the "dir" command to veridy you're in the correct folder

Screenshot 2021-04-05 141214

Run the command "py -m pip install -r requirements.txt" to install all of the required dependencies

Screenshot 2021-04-05 141640

Wait for the installations to complete, then run the command "python webtoons_scraping.py" to execute the script

Screenshot 2021-04-05 142001

The script will display data being actively scraped until the eventual message "EXECUTION COMPLETE"

After execution, 2 output files will appear in the directory, one in CSV and one in XLSX format

Disclaimer

We checked robots.txt file of the URL: https://www.webtoons.com/en/challenge/tested/list?title_no=231173&page=1 and learned that we are allowed to scrape comic data.