- Build Data Vault powered by dbtVault and Greenplum
- Deploy Infrastructure as Code with Terraform and Yandex.Cloud
- Instant development with Github Codespaces
- Assignment checks with Github Actions
- Fork this repository
- Configure Developer Environment
- Deploy Infrastructure
- Check database connection
- Populate Data Vault day-by-day
- Build Business Vault on top of Data Vault
- Create and submit PR
You have got several options to set up:
Use devcontainer (locally)
-
Install Docker on your local machine.
-
Install devcontainer CLI:
Open command palette (CMD + SHIFT+ P) type Install devcontainer CLI
-
Next build and open dev container:
# build dev container devcontainer build . # open dev container devcontainer open .
Verify you are in a development container by running commands:
terraform -v
yc --version
dbt --version
If any of these commands fails printing out used software version then you are probably running it on your local machine not in a dev container!
-
Get familiar with Managed Service for Greenplum
-
Install and configure
yc
CLI: Getting started with the command-line interface by Yandex Cloudyc init
-
Populate
.env
file.env
is used to store secrets as environment variables.Copy template file .env.template to
.env
file:cp .env.template .env
Open file in editor and set your own values.
❗️ Never commit secrets to git
-
Set environment variables:
export YC_TOKEN=$(yc iam create-token) export YC_CLOUD_ID=$(yc config get cloud-id) export YC_FOLDER_ID=$(yc config get folder-id) export TF_VAR_folder_id=$(yc config get folder-id) export $(xargs <.env)
-
Deploy using Terraform
Configure YC Terraform provider:
cp terraformrc ~/.terraformrc
terraform init terraform validate terraform fmt terraform plan terraform apply
Store terraform output values as Environment Variables:
export DBT_HOST=$(terraform output -raw greenplum_host_fqdn) export DBT_USER='greenplum' export DBT_PASSWORD=${TF_VAR_greenplum_password} export S3_ACCESSKEY=$(terraform output -raw access_key) export S3_SECRETKEY=$(terraform output -raw secret_key)
[EN] Reference: Getting started with Terraform by Yandex Cloud
[RU] Reference: Начало работы с Terraform by Yandex Cloud
-
Alternatively, deploy using yc CLI
Deploy using yc CLI:
Checklist:
- Egress NAT (required to access s3): https://cloud.yandex.com/en/docs/vpc/operations/create-nat-gateway
- S3 service account keys (required for external tables access): https://cloud.yandex.com/en/docs/iam/operations/sa/create-access-key
- Greenplum: https://cloud.yandex.com/en/docs/cli/cli-ref/managed-services/managed-greenplum/
yc managed-greenplum cluster create gp_datavault \ --network-name default \ --zone-id ru-central1-a \ --environment prestable \ --master-host-count 2 \ --segment-host-count 2 \ --master-config resource-id=s3-c2-m8,disk-size=30,disk-type=network-ssd \ --segment-config resource-id=s3-c2-m8,disk-size=30,disk-type=network-ssd \ --segment-in-host 1 \ --user-name greenplum \ --user-password $TF_VAR_greenplum_password \ --greenplum-version 6.22 \ --assign-public-ip yc vpc gateway create --name gp-gateway yc vpc route-table create --name=gp-route-table --network-name=default --route destination=0.0.0.0/0,gateway-id=<gateway_id> yc vpc subnet update <subnet_name> --route-table-name=gp-route-table yc managed-greenplum hosts list master --cluster-name gp_datavault export DBT_HOST=$DBT_HOST export DBT_USER=$DBT_USER export DBT_PASSWORD=$TF_VAR_greenplum_password export S3_ACCESSKEY=$S3_ACCESSKEY export S3_SECRETKEY=$S3_SECRETKEY
Configure JDBC (DBeaver) connection:
Make sure dbt can connect to your target database:
dbt debug
If any errors check ENV values are present:
env | grep DBT_
- Initialize data sources (External tables)
dbt run-operation init_s3_sources
- Install packages:
dbt deps
- Run models step-by-step
Load one day to Data Vault structures:
dbt run -m tag:raw
dbt run -m tag:stage
dbt run -m tag:hub
dbt run -m tag:link
dbt run -m tag:satellite
dbt run -m tag:t_link
- Load next day
Simulate next day load by incrementing load_date
varible:
# dbt_profiles.yml
vars:
load_date: '1992-01-02' # increment by one day
And update data vault:
dbt build