dbt-athena

Supports dbt version 1.3.*
Supports Seeds
Correctly detects views and their columns
Support incremental models
- Support two incremental update strategies: insert_overwrite and append
- Does not support the use of unique_key
Only supports Athena engine 2
- Changing Athena Engine Versions
Does not support Python models

Installation

pip install dbt-athena-community
Or pip install git+https://github.com/dbt-athena/dbt-athena.git

Prerequisites

To start, you will need an S3 bucket, for instance my-staging-bucket and an Athena database:

CREATE DATABASE IF NOT EXISTS analytics_dev
COMMENT 'Analytics models generated by dbt (development)'
LOCATION 's3://my-staging-bucket/'
WITH DBPROPERTIES ('creator'='Foo Bar', 'email'='[email protected]');

Notes:

Take note of your AWS region code (e.g. us-west-2 or eu-west-2, etc.).
You can also use AWS Glue to create and manage Athena databases.

Credentials

This plugin does not accept any credentials directly. Instead, credentials are determined automatically based on aws cli/boto3 conventions and stored login info. You can configure the AWS profile name to use via aws_profile_name. Checkout DBT profile configuration below for details.

Configuring your profile

A dbt profile can be configured to run against AWS Athena using the following configuration:

Option	Description	Required?	Example
s3_staging_dir	S3 location to store Athena query results and metadata	Required	`s3://bucket/dbt/`
region_name	AWS region of your Athena instance	Required	`eu-west-1`
schema	Specify the schema (Athena database) to build models into (lowercase only)	Required	`dbt`
database	Specify the database (Data catalog) to build models into (lowercase only)	Required	`awsdatacatalog`
poll_interval	Interval in seconds to use for polling the status of query results in Athena	Optional	`5`
aws_profile_name	Profile to use from your AWS shared credentials file.	Optional	`my-profile`
work_group	Identifier of Athena workgroup	Optional	`my-custom-workgroup`
num_retries	Number of times to retry a failing query	Optional	`3`

Example profiles.yml entry:

athena:
  target: dev
  outputs:
    dev:
      type: athena
      s3_staging_dir: s3://athena-query-results/dbt/
      region_name: eu-west-1
      schema: dbt
      database: awsdatacatalog
      aws_profile_name: my-profile
      work_group: my-workgroup

Additional information

threads is supported
database and catalog can be used interchangeably

Usage notes

Models

Table Configuration

external_location (default=none)
- The location where Athena saves your table in Amazon S3
- If none then it will default to {s3_staging_dir}/tables
- If you are using a static value, when your table/partition is recreated underlying data will be cleaned up and overwritten by new data
partitioned_by (default=none)
- An array list of columns by which the table will be partitioned
- Limited to creation of 100 partitions (currently)
bucketed_by (default=none)
- An array list of columns to bucket data
bucket_count (default=none)
- The number of buckets for bucketing your data
format (default='parquet')
- The data format for the table
- Supports ORC, PARQUET, AVRO, JSON, or TEXTFILE
write_compression (default=none)
- The compression type to use for any storage format that allows compression to be specified. To see which options are available, check out CREATE TABLE AS
field_delimiter (default=none)
- Custom field delimiter, for when format is set to TEXTFILE

More information: CREATE TABLE AS

Supported functionality

Support for incremental models:

Support two incremental update strategies with partitioned tables: insert_overwrite and append
Does not support the use of unique_key

Due to the nature of AWS Athena, not all core dbt functionality is supported. The following features of dbt are not implemented on Athena:

Snapshots

Known issues

Quoting is not currently supported

If you need to quote your sources, escape the quote characters in your source definitions:

version: 2

sources:
  - name: my_source
    tables:
      - name: first_table
        identifier: "first table"       # Not like that
      - name: second_table
        identifier: "\"second table\""  # Like this

Tables, schemas and database should only be lowercase
Only supports Athena engine 2
- Changing Athena Engine Versions

Contributing

This connector works with Python from 3.7 to 3.10.

Getting started

In order to start developing on this adapter clone the repo and run this make command (see Makefile) :

make setup

It will :

Install all dependencies.
Install pre-commit hooks.

Next, configure the environment variables in dev.env to match your Athena development environment.

Running tests

You can run the tests using make:

make run_tests

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
dbt		dbt
test/integration		test/integration
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
dev-requirements.txt		dev-requirements.txt
dev.env		dev.env
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbt-athena

Installation

Prerequisites

Credentials

Configuring your profile

Usage notes

Models

Table Configuration

Supported functionality

Known issues

Contributing

Getting started

Running tests

Community

About

Releases

Packages

Languages

License

chocoapp/dbt-athena

Folders and files

Latest commit

History

Repository files navigation

dbt-athena

Installation

Prerequisites

Credentials

Configuring your profile

Usage notes

Models

Table Configuration

Supported functionality

Known issues

Contributing

Getting started

Running tests

Community

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages