-
Notifications
You must be signed in to change notification settings - Fork 16
Creating a River
Find a URL that contains useful or interesting data. It must be public (no authentication). This data should change over time. An HTTP request is sent to each URL, and the response body is passed to your parser.
Create a directory in /rivers
and name it something unique. This is where all the rivers go.
Write a JavaScript function called parse.js
that parses the response body and extracts a stream of data over time. See an example parser for NYC Traffic data. The function looks like this:
module.exports = function(body, options, temporalDataCallback, metaDataCallback) {
// 1. parse the body
// 2. call the callbacks with data
// options contains "config" and "url"
};
We will get to the callbacks in a minute...
NOTE: You may also create an
initialize
function if you need to build your own source URL list, or run some process on River View startup.
Put it in config.yml
like this. Each URL in sources
is called at the interval
specified and the response body text is sent to your parser. You must provide a list of fields
and properties
.
Fields are the keys to values within your data that change over time. For example, the fields for traffic paths might be speed
and travelTime
. These are temporal data labels, and it is expected that your parser will provide values for these fields every time it is called. Example fields from the nyc-traffic
river config:
fields:
- Speed
- TravelTime
Information about the data. You have the opportunity to update them every time your parser is called if you want. Example properties from the nyc-traffic
river config:
metadata:
- Borough
- linkName
- linkId
- linkPoints
- Owner
- Transcom_id
You push data into River View by calling the temporalDataCallback
and metaDataCallback
callbacks in your parser with data for one Stream in the River.
temporalDataCallback(streamId, timestamp, fieldValues);
Where:
-
streamId
(string
) is the unique identifier for the Stream being updated with new data (example: an id to a traffic route) -
timestamp
(integer
) is the UNIX timestamp (NOT with milliseconds!) for the data (MUST match thetimezone
string in the config -
fieldValues
(array
) is the actual scalar data values corresponding to the fields defined in the config (example: for a traffic config containing the fields[Speed, TravelTime]
, the data should look something like[23.1, 100]
metaDataCallback(streamId, metadata);
Where:
-
streamId
(string
) is the unique identifier for the Stream being updated with new data (example: an id to a traffic route) -
metadata
(object
) is a key/value object with keys matching the metadata defined in the config
There is a test you can run that will exercise your new River:
node test-river.js <river-name>
... where <river-name>
is the same as the directory name of your river. You should see output like this:
∙ node test-river.js dummy
Testing river dummy
river directory
✓ exists
✓ has a config.yml
✓ has a parser.js
river config
✓ is valid YAML
✓ has a description
✓ has an author
✓ has an email
✓ has a valid timezone
✓ has at least one source
✓ sources all resolve to working URLs (2301ms)
✓ has at least one field
river parser
✓ parse script exports a function
when passed a live response body
✓ calls the temporalDataCallback with data matching config (1005ms)
✓ calls the metadataCallback with JSON-parseable data (1004ms)
15 passing (4s)
After testing your river out locally using a local Redis instance, you should create a new pull request against this repository containing your parser.js
and config.yml
in the new directory you created in step two. Your PR will be reviewed and tested before being merged. Once merged, your new river will go live on the next deployment.
-
interval
must be more than 1 minute -
expires
must be less than 6 months