Skip to content
Matthew Taylor edited this page Sep 17, 2015 · 21 revisions

Table of Contents

How to Build a River

Step One: Find a public data source on the internet

Find a URL that contains useful or interesting data. It must be public (no authentication). This data should change over time. An HTTP request is sent to each URL, and the response body is passed to your parser.

Step Two: Name it

Create a directory in /rivers and name it something unique. This is where all the rivers go.

Step Three: Write a parser

Write a JavaScript function called parse.js that parses the response body and extracts a stream of data over time. See an example parser for NYC Traffic data. The function looks like this:

module.exports = function(body, options, temporalDataCallback, metaDataCallback) {
    // 1. parse the body
    // 2. call the callbacks with data
    // options contains "config" and "url"
};

We will get to the callbacks in a minute...

NOTE: You may also create an initialize function if you need to build your own source URL list, or run some process on River View startup.

Step Four: Write a config

Put it in config.yml like this. Each URL in sources is called at the interval specified and the response body text is sent to your parser. You must provide a list of fields and properties.

Fields

Fields are the keys to values within your data that change over time. For example, the fields for traffic paths might be speed and travelTime. These are temporal data labels, and it is expected that your parser will provide values for these fields every time it is called. Example fields from the nyc-traffic river config:

fields:
  - Speed
  - TravelTime

Metadata

Information about the data. You have the opportunity to update them every time your parser is called if you want. Example properties from the nyc-traffic river config:

metadata:
  - Borough
  - linkName
  - linkId
  - linkPoints
  - Owner
  - Transcom_id

Step Five: Send Data

You push data into River View by calling the temporalDataCallback and metaDataCallback callbacks in your parser with data for one Stream in the River.

Sending Temporal Data (field data)

temporalDataCallback(streamId, timestamp, fieldValues);

Where:

  • streamId (string) is the unique identifier for the Stream being updated with new data (example: an id to a traffic route)
  • timestamp (integer) is the UNIX timestamp (NOT with milliseconds!) for the data (MUST match the timezone string in the config
  • fieldValues (array) is the actual scalar data values corresponding to the fields defined in the config (example: for a traffic config containing the fields [Speed, TravelTime], the data should look something like [23.1, 100]

Sending Static Data (metadata)

metaDataCallback(streamId, metadata);

Where:

  • streamId (string) is the unique identifier for the Stream being updated with new data (example: an id to a traffic route)
  • metadata (object) is a key/value object with keys matching the metadata defined in the config

Step Seven: Test It

There is a test you can run that will exercise your new River:

node test-river.js <river-name>

... where <river-name> is the same as the directory name of your river. You should see output like this:

∙ node test-river.js dummy
Testing river dummy


  river directory
    ✓ exists
    ✓ has a config.yml
    ✓ has a parser.js

  river config
    ✓ is valid YAML
    ✓ has a description
    ✓ has an author
    ✓ has an email
    ✓ has a valid timezone
    ✓ has at least one source
    ✓ sources all resolve to working URLs (2301ms)
    ✓ has at least one field

  river parser
    ✓ parse script exports a function
    when passed a live response body
      ✓ calls the temporalDataCallback with data matching config (1005ms)
      ✓ calls the metadataCallback with JSON-parseable data (1004ms)


  15 passing (4s)

Step Eight: Create a Pull Request

After testing your river out locally using a local Redis instance, you should create a new pull request against this repository containing your parser.js and config.yml in the new directory you created in step two. Your PR will be reviewed and tested before being merged. Once merged, your new river will go live on the next deployment.

Gotchas

  • interval must be more than 1 minute
  • expires must be less than 6 months
Clone this wiki locally