Creating a River

Step One: Find a public data source on the internet

Find a URL that contains useful or interesting data. It must be public (no authentication). This data should change over time. An HTTP request is sent to each URL, and the response body is passed to your parser.

Step Two: Name it

Create a directory in /rivers and name it something unique. This is where all the rivers go.

Step Three: Write a parser

Write a JavaScript function called parse.js that parses the response body and extracts a stream of data over time. See an example parser for NYC Traffic data. The function looks like this:

module.exports = function(config, body, url, fieldCallback, propertyCallback) {
    // 1. parse the body
    // 2. call the callbacks with data
};

We will get to the callbacks in a minute...

Step Four: Write a config

Put it in config.yml like this. Each URL in sources is called at the interval specified and the response body text is sent to your parser. You must provide a list of fields and properties.

Fields

Fields are the keys to values within your data that change over time. For example, the fields for traffic paths might be speed and travelTime. These are temporal data labels, and it is expected that your parser will provide values for these fields every time it is called. Example fields from the nyc-traffic river config:

fields:
  - Speed
  - TravelTime

Metadata

Information about the data. You have the opportunity to update them every time your parser is called if you want. Example properties from the nyc-traffic river config:

metadata:
  - Borough
  - linkName
  - linkId
  - linkPoints
  - Owner
  - Transcom_id

Step Five: Send Data

You push data into River View by calling the temporalDataCallback and metaDataCallback callbacks in your parser.

Sending Temporal Data (field data)

temporalDataCallback(error, id, timestamp, fieldValues);

Where:

error (Error) is any error that occurred while parsing the data that prevented completion (when specified, this should be the only argument given)
id (string) is the unique identifier for the data object being updated with new data (example: an id to a traffic route)
timestamp (integer) is the UNIX timestamp (NOT with milliseconds!) for the data (MUST match the timezone string in the config
fieldValues (array) is the actual scalar data values corresponding to the fields defined in the config (example: for a traffic config containing the fields [Speed, TravelTime], the data should look something like [23.1, 100]

Sending Static Data (metadata)

metaDataCallback(error, id, metadata);

Where:

error (Error) is any error that occurred while parsing the data that prevented completion (when specified, this should be the only argument given)
id (string) is the unique identifier for the data object being updated with new data (example: an id to a traffic route)
metadata (object) is a key/value object with keys matching the metadata defined in the config

Step Seven: Create a Pull Request

After testing your river out locally using a local Redis instance, you should create a new pull request against this repository containing your parser.js and config.yml in the new directory you created in step two. Your PR will be reviewed and tested before being merged. Once merged, your new river will go live on the next deployment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly