The following document describes generic shared deployments for various rules, with one BqTail and BqDispatch cloud functions per project.
It is highly recommended to use transient project just for bqtail ingestion with transient dataset option. In this case you do not count load/query/copy jobs towards project level quota, and still have ability to ingest data to your final project's dataset. If there is no transformation option used, after loading data to transient table, data is appended to dest project table with If no transformation option is used data is appended with copy operation which is free of charge, otherwise regular SQL usage pricing applies.
The following google storage layout is used to deploy and operate serverless data ingestion with BqTail:
This bucket stores all configuration files:
${configBucket}:
/
| - BqTail
| |- config.json
| |- Rules
| | | - rule1.yaml
| | | - group_folder
| | | - rule2.yaml
| | | - ruleN.json
| - BqDispatch
| |- config.json
Configuration bucket stores
Once data arrives to trigger bucket, BqTail matches datafile with a rule to start ingestion process.
This bucket stores all transient data and history journal including errors
${opsBucket}:
/
| - BqTail/errors
| - BqDispatch/errors
This bucket stores all data that needs to be ingested to Big Query,
${triggerBucket}
/
| - processX/YYYY/MM/DD/tableL_0000xxx.avro
| - processZ/YYYY/MM/DD/tableM_0000xxx.avro
This bucket is used by BqDispatcher to manges scheduled async batches and BigQuery running jobs.
This bucket stores data exported from BigQuery, it can be source for Storage Mirror FaaS cloud function.
${exportBucket}
To manage low CF latency (under 100ms at 50%) the following buckets need to be deployed in the same location as BqTail/BqDispatch cloud functions:
It is recommended to deploy trigger bucket as Multi region, so data to BigQuery flows even if one region goes down. In case of emergency you can easily redeploy BqTail to unaffected region and resume data ingestion with Multi region trigger bucket.
Endly min version required: v0.49.2
You can use endly docker container
mkdir -p ~/e2e
docker run --name endly -v /var/run/docker.sock:/var/run/docker.sock -v ~/e2e:/e2e -v ~/e2e/.secret/:/root/.secret/ -p 7722:22 -d endly/endly:latest-ubuntu16.04
ssh [email protected] -p 7722 ## password is dev
## create localhost endly secret with
endly -l=localhost
## type user root, and password dev (you can skip SSH setup)
## check ~/.secret/localhost.json SSH secret file with encrypted password was created
apt-get install vim
Or run Download latest binary to run on the localhost
-
SSH credentials (you can skip this step if you are using endly container)
On OSX make sure that you have SSH remote login enabled
sudo systemsetup -setremotelogin on
-
Google Secrets for service account.
- Create service account secrets
- Set role required by cloud function/scheduler deployment
- Cloud Function admin
- Editor
- Copy google secret to ~/.secret/myProjectSecret.json (note that in endly container it is /root/.secret/myProjectSecret.json)
-
Slack credentials (optionally)
The slack credentials uses the following JSON format
@slack.json
{
"Token": "MY_VALID_OAUTH_SLACK_TOKEN"
}
To encrypt slack in google storage with KMS you can run the following command
git clone https://github.com/viant/bqtail.git
cd bqtail/deployment
endly secure_slack authWith=myProjectSecret slackOAuth=slack.json
To deploy the described infrastructure use endly automation runner.
git clone https://github.com/viant/bqtail.git
cd bqtail/deployment
endly run authWith=myProjectSecret region='us-central1'
To redeploy only BqTail and BqDispatch cloud functions run the following command.
git clone https://github.com/viant/bqtail.git
cd bqtail/deployment
endly run -t='build,deploy' authWith=myProjectSecret region='us-cetnral1'
Once deployment is successful you can check
- The following buckets are present
- ${PROJECT_ID}_config (configuration bucket)
- ${PROJECT_ID}_operation (journal bucket)
- ${PROJECT_ID}_bqtail (cloud functiontrigger bucket)
- ${PROJECT_ID}_bqdispatch (bqdispatch bucket)
- The following cloud functions are present (check logs for error)
- The following Cloud Scheduler is present (check for successful run)
- BqDispatch with successful run
All automation testing workflow copy rule to gs://${configBucket}/BqTail/Rules/, followed by uploading data file to gs://${triggerBucket}/xxxxxx matching the rule, to trigger data ingestion. In the final step the workflow waits and validate that data exists in dest tables.
When you test a new rule manually, upload the rule to gs://${configBucket}/BqTail/Rules/.
Make sure to remove gs://${configBucket}/BqTail/.cache_ file if it is present before uploading datafile to trigger bucket. It will get recreated with a BqTail execution, triggered by datafile upload to trigger bucket.
git clone https://github.com/viant/bqtail.git
cd bqtail/deployment/test/async
endly test authWith=myTestProjectSecrets
Where:
Post run check
In the Cloud function Log you should be able to see the following:
- Successful batching events (BqTail log stream) for each file (2 files):
{"Batched":true,"EventID":"1086565206770154","IsDataFile":true,,"Matched":true,"MatchedURL":"gs://xx_bqtail/deployment_test/async/2020-04-04T11:43:30-07:00/dummy_1.json","Retriable":true,"RuleCount":34,"Started":"2020-04-04T18:43:31Z","Status":"ok","TimeTakenMs":5291,"TriggerURL":"gs://xx_bqtail/deployment_test/async/2020-04-04T11:43:30-07:00/dummy_1.json","Window":{"Async":true,"DestTable":"xx:test.dummy","DoneProcessURL":"gs://xx_operation/BqTail/Journal/Done/xx:test.dummy/2020-04-04_18/1086565206770154.run","End":"2020-04-04T18:44:00Z","EventID":"1086565206770154","FailedURL":"gs://xx_operation/BqTail/Journal/failed","ProcessURL":"gs://xx_operation/BqTail/Journal/Running/xx:test.dummy--1086565206770154.run","RuleURL":"gs://xx_config/BqTail/Rules/deployment_async_test.json","Source":{"Status":"pending","Time":"2020-04-04T18:43:30Z","URL":"gs://xx_bqtail/deployment_test/async/2020-04-04T11:43:30-07:00/dummy_1.json"},"Start":"2020-04-04T18:43:30Z","URL":"gs://xx_bqdispatch/BqDispatch/Tasks/xx:test.dummy_1179878484004789046_1586025840.win"}}
{"Batched":true,"BatchingEventID":"1086562538339341","EventID":"1086562538339341","IsDataFile":true,"ListOpCount":34,"Matched":true,"MatchedURL":"gs://xx_bqtail/deployment_test/async/2020-04-04T11:43:30-07:00/dummy_2.json","Retriable":true,"RuleCount":34,"Started":"2020-04-04T18:43:40Z","Status":"ok","TimeTakenMs":269,"TriggerURL":"gs://xx_bqtail/deployment_test/async/2020-04-04T11:43:30-07:00/dummy_2.json","WindowURL":"gs://xx_bqdispatch/BqDispatch/Tasks/xx:test.dummy_1179878484004789046_1586025840.win"} BqTail 1086562538339341
- Successful batch scheduling (BqDispatch log stream)
- Load job submission with batch runner (BqTail log stream)
- BigQuery Load job completion notification (BqDispatch log stream)
- Big Query copy job submission from transient table to dest table (BqTail log stream)
- BigQuery Copy job completion notification (BqDispatch log stream)
- Data should be present in destination table.
BqDispatch Log example:
{
"BatchCount": 1,
"Batched": {
"gs://xx_bqdispatch/BqDispatch/Tasks/xx:test.dummy_1179878484004789046_1586025840.win": "2020-04-04T22:13:00Z"
},
"Cycles": 17,
"Jobs": {
"Jobs": {
"gs://xx_bqdispatch/BqDispatch/Tasks/proj:xx:US/xx:test.dummy-1179878484004789046_00001_load--dispatch": {
"Project": "",
"Region": "",
"ID": "xx_test_dummy--1179878484004789046_00001_load--dispatch",
"URL": "gs://xx_bqdispatch/BqDispatch/Tasks/proj:xx:US/xx:test.dummy--1179878484004789046_00001_load--dispatch",
"Status": "DONE"
},
"gs://xx_bqdispatch/BqDispatch/Tasks/proj:xx:US/xx:test.dummy--1179878484004789046_00002_copy--dispatch": {
"Project": "",
"Region": "",
"ID": "xx_test_dummy--1179878484004789046_00002_copy--dispatch",
"URL": "gs://xx_bqdispatch/BqDispatch/Tasks/proj:xx:US/xx:test.dummy--1179878484004789046_00002_copy--dispatch",
"Status": "DONE"
}
}
},
"Performance": {
"xx": {
"ProjectID": "xx",
"Running": {
"LoadJobs": 1,
"BatchJobs": 1
},
"Pending": {},
"Dispatched": {
"CopyJobs": 1,
"LoadJobs": 1
},
"Throttled": {}
}
},
"Started": "2020-04-04T22:13:04Z",
"Status": "ok",
"TimeTakenMs": 55000
}
Note that, When datafile is not matched with ingestion rule it returns "Status":"noMatch"
git clone https://github.com/viant/bqtail.git
cd bqtail/deployment/test/override
endly test authWith=myTestProjectSecrets
Where:
You can find more example for various configuration setting in end to end tetst cases
Deploy monitor with scheduler
git clone https://github.com/viant/bqtail.git
cd bqtail/deployment/monitor
endly deploy authWith=myProjectSecret region=us-central1
To ingest monitoring status:
- Run the following DDL to create destination table
- Add the bqtail ingestion rule to gs://${opsConfig}/BqTail/Rules/sys/