- Download and install Python 3.11.x
- Download and install Git
- Download and install VScode
- For windows: Download and install windows terminal
- Open a terminal
- Clone the repository:
git clone https://github.com/artefactory/dbt_intro_da.git
- Change directory in the repository folder:
cd dbt_intro_da
- If you have conda installed you will need to deactivate it:
conda deactivate
- Setup a python environment:
- MacOs:
python3 -m venv venv
- Windows:
python -m venv venv
- MacOs:
- Activate python environment:
- MacOs:
source venv/bin/activate
- Windows:
Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope CurrentUser
. .\venv\Scripts\activate
- MacOs:
- Install python packages
pip install -r requirements.txt
Additionnal setup to work with bigquery (optionnal)
In order to work with bigquery you need to change the target in the profiles to the bigquery target.
You also need to update the dataset used in the profile by changing name
to your name.
dbt_intro_da:
target: dev_bigquery
outputs:
duckdb:
path: dbt.duckdb
type: duckdb
threads: 4
dev_bigquery:
type: bigquery
method: oauth
project: formation-sql-316408
dataset: dbt_intro_da_name
location: EU
threads: 4
- Download and install Gcloud
- Connect to gcloud:
gcloud auth application-default login \
--scopes=https://www.googleapis.com/auth/bigquery,\
https://www.googleapis.com/auth/drive.readonly,\
https://www.googleapis.com/auth/iam.test
dbt debug
should now tell you everything is OKdbt deps
to install dbt packages used in this projectdbt run
The first dbt run
should create a file named dbt.duckdb
.
This file will contains all the tables views we are creating.
VScode extension (optionnal but nice to have)
To help us navigate the dbt project we are using the extension dbt power user:
- Install the extension: https://marketplace.visualstudio.com/items?itemName=innoverio.vscode-dbt-power-user
There is one setting that should be added
To look explore what is inside the database we can use the dbt show
command: https://docs.getdbt.com/reference/commands/show
For example the command:
dbt show -s local_sales
should output something like:
13:58:16 Running with dbt=1.6.6
13:58:17 Registered adapter: duckdb=1.6.1
13:58:17 Found 1 model, 4 sources, 0 exposures, 0 metrics, 351 macros, 0 groups, 0 semantic models
13:58:17
13:58:17 Concurrency: 4 threads (target='dev_duckdb')
13:58:17
13:58:17 Previewing node 'local_sales':
| product_category | pdt_SUB_CATEGORY | barcode_ean13 | site_key | DATE | CA |
| -------------------- | ----------------- | ----------------- | ----------------- | ---------- | ----- |
| APERITIFS | SANS ALCOOL | f_f_0327607249893 | f_f_8429768983288 | 2023-03-08 | 37.57 |
| VQPRD ROUGES/HORS... | VINS DU SUD OUEST | f_f_4371089866174 | f_f_8429768983288 | 2023-10-15 | 11.79 |
| ALCOOLS | PUNCHS | f_f_2239755478262 | f_f_8429768983288 | 2023-08-13 | 31.95 |
| ALCOOLS | GIN-VODKA-TEQUILA | f_f_6023176352753 | f_f_8429768983288 | 2023-06-20 | 22.36 |
| VQPRD ROUGES/HORS... | VINS DU SUD OUEST | f_f_2550597061567 | f_f_8429768983288 | 2023-05-22 | 34.23 |
Which are a couple of rows from the table defined by the dbt model local_sales
We can also run arbitrary queries:
dbt show --inline "select distinct(product_category) from {{ ref('local_sales') }}"
should output something like:
13:58:20 Running with dbt=1.6.6
13:58:20 Registered adapter: duckdb=1.6.1
13:58:20 Found 1 model, 4 sources, 0 exposures, 0 metrics, 351 macros, 0 groups, 0 semantic models
13:58:20
13:58:20 Concurrency: 4 threads (target='dev_duckdb')
13:58:20
13:58:20 Previewing inline node:
| product_category |
| -------------------- |
| ALCOOLS |
| APERITIFS |
| VQPRD BORDEAUX RO... |
| EAUX AROMATISEES |
| VINS DE TABLES/PAYS |