- About this document
- Getting the code
- Running
dbt-hive
in development - Testing
- Submitting a Pull Request
This document is a guide for anyone interested in contributing to the dbt-hive
repository. It outlines how to create issues and submit pull requests (PRs).
This is not intended as a guide for using dbt-hive
in a project.
We assume users have a Linux or MacOS system. You should have familiarity with:
- Python
virturalenv
s - Python modules
pip
- common command line utilities like
git
.
In addition to this guide, we highly encourage you to read the dbt-core. Almost all information there is applicable here!
git
is needed in order to download and modify the dbt-hive
code. There are several ways to install Git. For MacOS, we suggest installing Xcode or Xcode Command Line Tools.
If you are not a member of the Cloudera
GitHub organization, you can contribute to dbt-hive
by forking the dbt-hive
repository. For more on forking, check out the GitHub docs on forking. In short, you will need to:
- fork the
dbt-hive
repository - clone your fork locally
- check out a new branch for your proposed changes
- push changes to your fork
- open a pull request of your forked repository against
cloudera/dbt-hive
If you are a member of the Cloudera
GitHub organization, you will have push access to the dbt-hive
repo. Rather than forking dbt-hive
to make your changes, clone the repository like normal, and check out feature branches.
-
Ensure you have the Python 3.8 or higher installed on the machine.
-
Ensure you have the latest version of
pip
installed by runningpip install --upgrade pip
in terminal. -
Either manually configure the virtual environment
3.1 Configure and activate a
virtualenv
as described in Setting up an environment.3.2. Install
dbt-core
in the activevirtualenv
. To confirm you installed dbt correctly, rundbt --version
andwhich dbt
.3.3. Install
dbt-hive
and development dependencies in the activevirtualenv
. Runpip install -e . -r dev-requirements.txt
.3.4. Add the pre-commit hook. Run
pre-commit install
-
OR Use
make
, to run multiple setup or test steps in combination.4.1. Run
make dev_setup
to setup the virtual environment and install dependencies4.2. Optionally, you can specify the venv directory to use via setting
VENV
variable in the make command egmake dev_setup VENV=.venv
When dbt-hive
is installed this way, any changes you make to the dbt-hive
source code will be reflected immediately (i.e. in your next local dbt invocation against a Hive target).
dbt-hive
contains functional tests. Functional tests require an actual Hive warehouse to test against.
- You can run functional tests "locally" by configuring a
test.env
file with appropriateENV
variables.
cp test.env.example test.env
$EDITOR test.env
WARNING: The parameters in your test.env
file must link to a valid Hive instance. The test.env
file you create is git-ignored, but please be extra careful to never check in credentials or other sensitive information when developing.
There are a few methods for running tests locally.
tox
takes care of managing Python virtualenvs and installing dependencies in order to run tests.
To Run individual test:
make test TESTS=tests/functional/adapter/test_basic.py::TestSimpleMaterializationsHive
To Run individual test for a specific python version:
make test TESTS=tests/functional/adapter/test_basic.py::TestSimpleMaterializationsHive PYTHON_VERSION=py38
To Run tests across all version of python:
make test_all_python_versions TESTS=tests/functional/adapter/test_basic.py::TestSimpleMaterializationsHive
The configuration of these tests are located in tox.ini
.
NOTE:
- Python versions for which you are running tests have to be installed on your machine manually.
- To configure the pytest setting, update pytest.ini. By default, all the tests run logs are captured in
logs/<test-run>/dbt.log
You may run a specific test or group of tests using pytest
directly or make
. Activate a Python virtualenv active with dev dependencies installed as explained in the installation steps. Use the appropriate profile like cdh_endpoint or dwx_endpoint. Then, run tests like so:
# Note: replace $strings with valid names
# run full tests suite against an environment/endpoint
python -m pytest --profile dwx_endpoint
# using make to run full tests suites against an environment/endpoint
make test PROFILE=dwx_endpoint
# run all hive functional tests in a directory
python -m pytest tests/functional/$test_directory --profile dwx_endpoint
python -m pytest tests/functional/adapter/test_basic.py --profile dwx_endpoint
# run all hive functional tests in a module
python -m pytest --profile dwx_endpoint tests/functional/$test_dir_and_filename.py
python -m pytest --profile dwx_endpoint tests/functional/adapter/test_basic.py
# run all hive functional tests in a class
python -m pytest --profile dwx_endpoint tests/functional/$test_dir_and_filename.py::$test_class_name
python -m pytest --profile dwx_endpoint tests/functional/adapter/test_basic.py::TestSimpleMaterializationsHive
# run a specific hive functional test
python -m pytest --profile dwx_endpoint tests/functional/$test_dir_and_filename.py::$test_class_name::$test__method_name
python -m pytest --profile dwx_endpoint tests/functional/adapter/test_basic.py::TestSimpleMaterializationsHive::test_base
To configure the pytest setting, update pytest.ini. By default, all the tests run logs are captured in logs/<test-run>/dbt.log
A dbt-hive
maintainer will review your PR and will determine if it has passed regression tests. They may suggest code revisions for style and clarity, or they may request that you add unit or functional tests. These are good things! We believe that, with a little bit of help, anyone can contribute high-quality code.
Once all tests are passing and your PR has been approved, a dbt-hive
maintainer will merge your changes into the active development branch. And that's it! Happy developing 🎉