-
Notifications
You must be signed in to change notification settings - Fork 3
Sinks
Sinking to BigQuery is rather simple:
- Define a schema in a .json file
- Declare the sink in the config file
- Make sure your pipeline code creates the correct JSON output
Terraform needs to know the BigQuery schema so it can create and manage tables. This same schema is used inside angled-dream to map the JSON output to BigQuery fields.
This is an example orion.json
file, notice the nesting possible with 'fields':
[
{"name": "event", "type": "string", "mode": "nullable"},
{"name": "owts", "type": "string", "mode": "nullable"},
{"name": "data", "type": "record", "mode": "nullable",
"fields": [
{"name": "action", "type": "record", "mode": "nullable",
"fields": [ {"name": "name", "type": "string", "mode": "nullable"}]},
{"name": "test_name", "type": "string", "mode": "nullable"}]
]}]
Check this file into your pipeline-controller
Github repo.
You need to put the BigQuery resource in the Sossity config file, so Sossity can use Terraform to create the table and dataset. The dataset and data table will not be deleted even if you remove them from the config file.
In the :sinks
key, an example:
"hxtrialorionbq" {:type "bq" :bigQueryDataset "hx_trial_orion_staging" :bigQueryTable "hx_trial_orion" :bigQuerySchema "/home/ubuntu/pipeline-controller/orion.json"}}
the path prefix for :bigQuerySchema
will always be /home/ubuntu/pipeline-controller
.
Sossity will read this config file and schema to create the necessary BigQuery table and dataset. Also, angled-dream will use the bigQuerySchema
file to turn json strings into BigQuery objects.
To write data to BigQuery, simply make sure a pipeline emits a JSON string with the correct structure and field names. There is no special data structure for BigQuery -- just use the field names you desire, and the sink will figure out the rest.
For example, if the BigQuery table schema looks like:
{"event":<str>,
"owts":<str>,
"data":{"action":<str>}}
then have your pipeline emit:
{"event":"webapp",
"owts":"v1",
"data":{"action":"click"}}
In Java, this would look something like:
Gson gson = new Gson();
Map<String,Object> topMap = new HashMap<>();
Map<String, String> dataMap; new HashMap<>();
topMap.put("event","webapp");
topMap.put("owts","v1");
dataMap.put("action","click");
topMap.put("data",dataMap);
String output = gson.toJson(topMap);