Skip to content
Matthew Taylor edited this page Jul 8, 2015 · 21 revisions

☠ don't do this it's not ready ☠

Step One: Find a public data source on the internet

Find a URL that contains useful or interesting data. It must be public (no authentication). This data should change over time. An HTTP request is sent to each URL, and the response body is passed to your parser.

Step Two: Name it

Create a directory in /rivers and name it something unique. This is where all the rivers go.

Step Three: Write a parser

Write a JavaScript function called parse.js that parses the response body and extracts a stream of data over time. See an example parser for NYC Traffic data. The function looks like this:

module.exports = function(config, body, url, fieldCallback, propertyCallback) {
    // 1. parse the body
    // 2. call the callbacks with data
};

We will get to the callbacks in a minute...

Step Four: Write a config

Put it in config.yml like this. Each URL in sources is called at the interval specified and the response body text is sent to your parser. You must provide a list of fields and properties.

Fields

Fields are the keys to values within your data that change over time. For example, the fields for traffic paths might be speed and travelTime. These are temporal data labels, and it is expected that your parser will provide values for these fields every time it is called. Example fields from the nyc-traffic river config:

fields:
  - Speed
  - TravelTime

Properties

Properties are like meta data. They are perceived as being static, but they may change over time. You have the opportunity to update them every time your parser is called if you want. Example properties from the nyc-traffic river config:

properties:
  - Borough
  - linkName
  - linkId
  - linkPoints
  - Owner
  - Transcom_id

Step Five: Send Data

You push data into River View by calling the fieldCallback and propertyCallback callbacks in your parser.

Sending Temporal Data (field data)

fieldCallback(error, id, timestamp, fieldValues);

Where:

  • error (Error) is any error that occurred while parsing the data that prevented completion (when specified, this should be the only argument given)
  • id (string) is the unique identifier for the data object being updated with new data (example: an id to a traffic route)
  • timestamp (integer) is the UNIX timestamp (NOT with milliseconds!) for the data (MUST match the timezone string in the config
  • fieldValues (array) is the actual scalar data values corresponding to the fields defined in the config (example: for a traffic config containing the fields [Speed, TravelTime], the data should look something like [23.1, 100]

Sending Static Data (properties)

propertyCallback(error, id, dataProperties);

Where:

  • error (Error) is any error that occurred while parsing the data that prevented completion (when specified, this should be the only argument given)
  • id (string) is the unique identifier for the data object being updated with new data (example: an id to a traffic route)
  • dataProperties (object) is a key/value object with keys matching the properties defined in the config
Clone this wiki locally