Audible Scraper

An Audible.in webscraper to gather audiobooks data.

The code in the Notebooks mentioned below and the scraping code are well documented/explained so as to cover all the steps that were taken. If you are new to data wrangling and data cleaning you can use them to understand better on Pandas and Matplotlib. As for the scraping, I'm currently writing a blogpost and will soon be out on how I organized it.

Libraries used

Regex
Beautiful Soup
Selenium
csv

The code used Selenium and Beautiful Soup in unison to gather the data. NOTE: It's recommended to have Firefox before you use the code as the Selenium driver used is based of Mozilla. You can elsewise, write your own Chrome driver by following this link.

Things to note about the customer scraper (main.py)

If you are going through the code, inside Loop : 2 you may see a bit of handwritten customization of the code. That is cause of Audible redirects incorrectly under few sub-categories. Such as under any sub-category if LGBTQ+ is present it redirects to the LGBTQ+ category and the category it was orginally under is ignored.

It is recommended to check them prior to running the code as Audible.in is updated everyday. Also, while running Loop: 1 since it's recommended to use 3-5 pages at once instead of all as this makes sure that the program runs smoothly.

Dataset

You can find the dataset here: Audiobooks Dataset

The description of the dataset is mentioned in the description section.

Data Cleaning

The Audiobooks dataset has two .csv files.

audible_uncleaned.csv
audible_cleaned.csv

If you are looking to use the uncleaned one to practice or wrangle the data based on your needs, here's a notebook that might help you get started. It covers from beginners to intermidiate.

Data Exploration

If you are rather interested to directly dive into the data and create exiciting data visulation, I've a notebook here. You can directly start with exploring the data, however make sure you are using the audible_cleaned.csv to start.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audible Scraper

Libraries used

Things to note about the customer scraper (main.py)

Dataset

You can find the dataset here: Audiobooks Dataset

Data Cleaning

Data Exploration

For any queries, questions and help, please find me on Twitter or LinkedIn.

About

Releases

Packages

Languages

License

snehangsude/audible_scraper

Folders and files

Latest commit

History

Repository files navigation

Audible Scraper

Libraries used

Things to note about the customer scraper (main.py)

Dataset

You can find the dataset here: Audiobooks Dataset

Data Cleaning

Data Exploration

For any queries, questions and help, please find me on Twitter or LinkedIn.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages