All scrapers should set document id's in the form [datetime] - [parser_name] - [unique key] #58

onyxfish · 2009-08-22T03:52:10Z

Where unique keys is whatever is appropriate to a given scraper. For Roll Call Vote scrapers this would be Roll #. For some scrapers this may be title--whatever makes a given event unique.

onyxfish · 2009-08-22T15:19:40Z

This has now been documented in the Database Planning section of the wiki:
http://wiki.github.com/bouvard/votersdaily/database-planning

onyxfish · 2009-08-29T07:09:33Z

Fixed for Python scrapers. This is def. a much better way of identifying each document.

chaunceyt · 2009-08-30T20:26:13Z

fixed closing.

onyxfish · 2009-08-31T03:44:37Z

It looks like the scrapers are still pulling in branch and entity names in the format: [datetime] - [parser_name] - [branch] - [entity] - [unique key]. Now that we are including parser name I think we should remove [branch] and [entity]. They really only make the id's longer and I'm already a bit concerned that some of our URL's are going to be overly lengthy.

Also, for the Roll Call Votes scrapers where there is a unique Vote Number, I really think we want to use that as the [unique key] portion rather than the title.

Going to reopen this ticket, pending discussion.

chaunceyt · 2009-08-31T04:28:01Z

will work on this week.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All scrapers should set document id's in the form [datetime] - [parser_name] - [unique key] #58

All scrapers should set document id's in the form [datetime] - [parser_name] - [unique key] #58

onyxfish commented Aug 22, 2009

onyxfish commented Aug 22, 2009

onyxfish commented Aug 29, 2009

chaunceyt commented Aug 30, 2009

onyxfish commented Aug 31, 2009

chaunceyt commented Aug 31, 2009

All scrapers should set document id's in the form [datetime] - [parser_name] - [unique key] #58

All scrapers should set document id's in the form [datetime] - [parser_name] - [unique key] #58

Comments

onyxfish commented Aug 22, 2009

onyxfish commented Aug 22, 2009

onyxfish commented Aug 29, 2009

chaunceyt commented Aug 30, 2009

onyxfish commented Aug 31, 2009

chaunceyt commented Aug 31, 2009