You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 16, 2018. It is now read-only.
identifier, which is the unique ID (only unique within forums) for the Disqus thread
Luckily for the identifier, it seems npr.org manually sets the Disqus ID as the Seamus ID, which we can know. However, the shortname is done per blog (so, nprtwoway, nprparallels, npred, etc.).
Seems like there are two ways we can do things:
Scrape the Seamus page for the #disqus-npr div, which has data-shortname and data-identifier attributes.
Retrieve the Seamus ID via the API or however we do that and keep a list of shortnames for each blog. Make sure we tag each post with the blog it gets posted to (probably a good idea anyway), and match on those tags.
Both are hard to maintain in various ways. With option one, the page could change and the scraper breaks, as it goes with all scrapers. With option two, we could get a new blog and forget to update with the new shortname. Option two also relies on remembering to tag the Carebot project with the blog.
Once we have those two pieces of data, we can get total comment counts from the API very easily. To get a number of unique commenters, we would have to get the entire thread and match on usernames to determine that number.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
No description provided.
The text was updated successfully, but these errors were encountered: