Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"C-SPAN US Senate Schedule Scraper" seems to be url-encoding the 'source_url' field. #92

Open
onyxfish opened this issue Sep 27, 2009 · 0 comments

Comments

@onyxfish
Copy link
Owner

This doesn't seem to affect every entry.

e.g.

{
   "_id": "2007-04-04T00:00:00Z -  C-SPAN US Senate Schedule Scraper",
   "_rev": "1-32ca7948af083f2df06d7cd275bf38f7",
   "datetime": "2007-04-04T00:00:00Z",
   "end_datetime": null,
   "title": "The House adjourned pursuant to H. Con. Res. 103. The next meeting is scheduled for 2:00 p.m. on April 16, 2007.",
   "description": null,
   "branch": "Legislative",
   "entity": "Senate",
   "source_url": "http%3A%2F%2Fwww3.capwiz.com%2Fc-span%2Fdbq%2Fofficials%2Fschedule.dbq%3Fcommittee%3Dus_house%26amp%3Bcommand%3Dcommittee_schedules%26amp%3Bchambername%3DSenate%26amp%3Bchamber%3DS%26amp%3Bperiod%3D",
   "source_text": "APRIL     04, 2007
\u000a\u0009The House adjourned pursuant to H. Con. Res. 103. The next meeting is scheduled for 2:00 p.m. on April 16, 2007.
", "access_datetime": "2009-09-27T17:16:09Z", "parser_name": "C-SPAN US Senate Schedule Scraper", "parser_version": "0.1" }

This does kick out as an error on the CouchDB validation report, but is labelled as "not an absolute url".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant