Skip to content

Data Highway is a low friction, schema-on-write, data streaming platform

License

Notifications You must be signed in to change notification settings

kyrsideris/data-highway

Repository files navigation

Data Highway

Start using

Maven Central GitHub license Build Coverage Status

Overview

What is Data Highway?

The Data Highway is a service that allows data to be easily produced and consumed via JSON messages over HTTPS/WSS. Data is first defined using a schema and a "road" is created which will accept messages that conform to this schema. Producers of data sets thus only need to define the structure of their data and are then able to send their data to a REST endpoint and not be concerned with what happens next. Data Highway will ensure that this data is made available for streaming consumption and also stored reliably in a "data lake" in the cloud for access by end users.

Architecture

Data Highway Architecture

Paver

Paver is Data Highway's administration endpoint. It provides the following features:

  • Road (Synonymous with Kafka topic) creation.
  • Schema registration and (soft) deletion.
  • Data-at-rest to Hive/S3 configuration.
  • Road-level producer and consumer authorisation.
Onramp

Onramp is Data Highway's producer endpoint. It allows users to submit messages to roads in JSON format over HTTPS.

Offramp

Offramp is Data Highway's consumer endpoint. It allows users to consume message from roads in JSON format over WSS.

Tollbooth

Tollbooth is the core of Data Highway. It provides the mechanism by which mutations to a road's model are persisted. Mutations can come from users (Paver) or internal agents. Anything wishing to make a mutation submit's a JSON Patch onto a deltas Kafka topic. Tollbooth consumes this topic, continuously applying patches to models and persisting them back onto the main Model (compact) topic.

Traffic Control

Traffic Control is the Kafka Agent. It is primarily responsible for managing Kafka topics in response to changes in models.

Loading Bay / Truck Park

Loading Bay is responsible for orchestrating the landing of data to S3 on a configured interval and managing Hive tables - creation, schema mutation and the addition of partitions.

Try it out

Try Test Drive, an in-memory version of Data Highway that exposes all the public facing endpoints in a single Spring Boot application or Docker container.

docker run -p 8080:8080 hotelsdotcom/road-test-drive:<tag>

Examples

Using a local instance of Test Drive, try creating road, registering a schema and producing and consuming messages using the build in user account user:pass.

Note: For the example below, cURL will prompt for a password which is pass.

Create a road

curl -sk \
  -u user \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
  "name": "my_road", 
  "description": "My Road",
  "teamName": "TEAM", 
  "contactEmail": "[email protected]",
  "partitionPath": "$.foo",
  "enabled": true,
  "authorisation": {
    "onramp": {
      "cidrBlocks": ["0.0.0.0/0"],
      "authorities": ["*"]
    },
    "offramp": {
      "authorities": {
        "*": ["PUBLIC"]
      }
    }
  }
}' https://localhost:8080/paver/v1/roads

Register a schema

curl -sk \
  -u user\
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
  "type" : "record",
  "name" : "my_record",
  "fields" : [
    {"name":"foo","type":"string"},
    {"name":"bar","type":"string"}
  ]
}' https://localhost:8080/paver/v1/roads/my_road/schemas

Produce messages

curl -sk \
  -u user\
  -H "Content-Type: application/json" \
  -d '[{"foo":"foo1","bar":"bar1"}]' \
  https://localhost:8080/onramp/v1/roads/my_road/messages

Consume messages

echo '{"type":"REQUEST","count":1}' |\
  websocat -nk wss://localhost:8080/offramp/v2/roads/my_road/streams/my_stream/messages?defaultOffset=EARLIEST

See: websocat

Building

Build and load docker images to the local docker daemon:

mvn clean package -Djib.goal=dockerBuild

Build without docker images:

mvn clean package -Djib.skip

Build and push docker images to a repo:

mvn clean package -Ddocker.repo=my.docker.repo

Contributors

Special thanks to the following for making data-highway possible!

Dave Maughan
Dave Maughan

πŸ’» 🎨 πŸ‘€ πŸ“–
James Grant
James Grant

πŸ’» 🎨 πŸ‘€ πŸ“– πŸ“’
Elliot West
Elliot West

πŸ’» 🎨 πŸ‘€ πŸ“– πŸ“’
Adrian Woodhead
Adrian Woodhead

πŸ’» 🎨 πŸ‘€ πŸ“–
Konrad Dowgird
Konrad Dowgird

πŸ’» 🎨 πŸ‘€ πŸ“–
Riccardo Freixo
Riccardo Freixo

πŸ’» 🎨 πŸ‘€ πŸ“– πŸš‡
Monica Nicoara
Monica Nicoara

πŸ€” πŸ“‹
Teiva Harsanyi
Teiva Harsanyi

πŸ’»
Kryiakos Sideris
Kryiakos Sideris

πŸ’»
Sandeep Solanki
Sandeep Solanki

πŸ’»

This project follows the all-contributors specification.

Legal

This project is available under the Apache 2.0 License.

Copyright 2019 Expedia Inc.

About

Data Highway is a low friction, schema-on-write, data streaming platform

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published