Wanted to see if I can figure out the logic behind what makes a sentence gets quoted / highlighted a lot.
Collect quotes from Goodreads (inc. relevant attributes).
Three crawlers:
goodreads.quotes.popular
Starts from https://www.goodreads.com/quotes?page=1
Collects the first-level Popular quotes, follows paginations.
goodreads.quotes.by_category
Starts from https://www.goodreads.com/quotes?page=1
Traverses the categories, collects Popular quotes for that category, follows pagination.
goodreads.quotes.book
Not implemented yet. The idea is to collect all the quotes for a particular book. TODO: accept the book URL as spider arg.