Skip to content
/ melt Public

(M)y End-to-End (ELT) with K8s, Argo Workflows, Airbyte, and DBT

License

Notifications You must be signed in to change notification settings

givonwash/melt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

melt: (M)y End-to-End (ELT)

End-to-end ELT pipeline(s) that make(s) use of a variety of "cloud native" technologies:

as well as Nix and Nix Flakes for project dependency and build management.

Why melt?

melt was developed with the intention of:

  1. making myself (@givonwash) more familiar with a variety of tools/technologies that I had limited prior experience with (e.g., Kubernetes)
  2. demonstrating mastery of tools/technologies that I (@givonwash) do have extensive experience in (e.g., dbt, Postgres, Nix, etc.)

melt is not production-grade software and exists primarily for demonstration purposes!

How It Works

Note

This project is in a (semi-)working state as of 9ec85f9; however, there are still a few "quirks" to be worked out and as such, this project should not be considered "finished"

At its core, melt relies upon three different technologies to do the heavy lifting of container orchestration and data extraction: Kubernetes, Argo Workflows, and Airbyte. The latter two technologies "sit" atop the former and are configured/utilized via the logic residing under ./melt-infra.

melt's usage of Argo Workflows revolves around a single workflow, elt-workflow, which handles the configuration of Airbyte, triggering of data synchronization jobs, DBT transformations, and more. elt-workflow is created as a CronWorkflow that runs every other hour. An in-progress run of elt-workflow, as visualized by Argo Workflows's UI, can be seen below:

In-progress elt-workflow run as seen from the Argo Workflows UI

The DBT project responsible for transforming data extracted and loaded by the elt-workflow is found under ./melt-dbt. melt-dbt adheres to DBT’s enumerated “best practices” of “staging”, “intermediate”, and “mart” transformation layers, with each layer molding data into an increasingly usable shape. A melt-dbt Docker image is uploaded to givonwash/melt-dbt upon pushes to main via the build/build-melt-dbt GitHub Actions job.

Usage

Note

This project was developed for demonstration purposes and has not been thoroughly tested against non-minikube Kubernetes clusters. Please proceed with caution &/or expect errors if deploying melt outside of this context.

With nix

If you have the Nix package manager installed, and configured to use the extra-experimental-features flakes and nix-command, this project can be made use of as follows:

# Make use of `devShells.default`
nix develop

# Start our `minikube` K8s cluster
minikube start

# Change to `melt-infra` to generate needed K8s manifests
cd melt-infra

# Generate ("synthesize") K8s manifests
#
# NOTE:
#   1. Alternatively these options can be defined as environment variables instead to protect secret
#      credentials
#   2. To see descriptions for these options, execute `npx tsx src/synth.ts --help`
npx tsx src/synth.ts --airbyte-namespace '<airbyte-namespace>' \
                     --airbyte-server-url '<airbyte-server-url>' \
                     --argo-namespace '<argo-namespace>' \
                     --postgres-namespace '<postgres-namespace>' \
                     --postgres-database '<postgres-database>' \
                     --postgres-admin-password '<postgres-admin-password>' \
                     --postgres-melt-user '<postgres-melt-user>' \
                     --postgres-melt-password '<postgres-melt-password>'

# Apply generated K8s manifests
#
# NOTE: you may have to run this command more than once if your K8s cluster fails to register Argo
#       Workflows CRDs quickly enough to avoid "missing definition" errors resulting from K8s
#       creating `CronWorkflows` (& the like) per the logic in `melt-infra`
kubectl apply -f dist/

# Optional: Monitor the various deployments/services/pods/etc. created by the previous command
minikube dashboard

# Optional: Monitor the `elt-workflow` by visiting http://localhost:2746 and pasting your access
#           token (printed to `stdout` by the second command below) into the relevant text box
#
#           Be sure to swap `${ARGO_WORKFLOWS_SERVER_SERVICE_NAME}` with your actual service name,
#           `${ARGO_WORKFLOWS_READER_SERVICE_ACCOUNT_SECRET}` with your actual service account
#           secret, and `<argo-namespace>` with the value passed to `--argo-namespace` above
kubectl port-forward --namespace='<argo-namespace>' service/${ARGO_WORKFLOWS_SERVER_SERVICE_NAME} 2746:2746
echo "Bearer $(kubectl get --namespace='<argo-namespace>' secret ${ARGO_WORKFLOWS_READER_SERVICE_ACCOUNT_SECRET} -o=jsonpath='{.data.token}' | base64 --decode)"

Without nix

melt relies upon, and encourages the usage of, Nix Flakes for build/dependency management; however, it is possible to make use of this project without installing the Nix package manager as well. To do so, you'll need to have helm, minikube, kubectl, and Node.js v22 installed via your preferred package management solution.

# Start our `minikube` K8s cluster
minikube start

# Change to `melt-infra` to generate needed K8s manifests
cd melt-infra

# Install the dependencies of `melt-infra`
npm install

# Generate ("synthesize") K8s manifests
#
# NOTE:
#   1. Alternatively these options can be defined as environment variables instead to protect secret
#      credentials
#   2. To see descriptions for these options, execute `npx tsx src/synth.ts --help`
npx tsx src/synth.ts --airbyte-namespace '<airbyte-namespace>' \
                     --airbyte-server-url '<airbyte-server-url>' \
                     --argo-namespace '<argo-namespace>' \
                     --postgres-namespace '<postgres-namespace>' \
                     --postgres-database '<postgres-database>' \
                     --postgres-admin-password '<postgres-admin-password>' \
                     --postgres-melt-user '<postgres-melt-user>' \
                     --postgres-melt-password '<postgres-melt-password>'

# Apply generated K8s manifests
#
# NOTE: you may have to run this command more than once if your K8s cluster fails to register Argo
#       Workflows CRDs quickly enough to avoid "missing definition" errors resulting from K8s
#       creating `CronWorkflows` (& the like) per the logic in `melt-infra`
kubectl apply -f dist/

# Optional: Monitor the various deployments/services/pods/etc. created by the previous command
minikube dashboard

# Optional: Monitor the `elt-workflow` by visiting http://localhost:2746 and pasting your access
#           token (printed to `stdout` by the second command below) into the relevant text box.
#
#           NOTE: You'll need the `base64` utility program as well for the second command to work.
#
#           Be sure to swap `${ARGO_WORKFLOWS_SERVER_SERVICE_NAME}` with your actual service name,
#           `${ARGO_WORKFLOWS_READER_SERVICE_ACCOUNT_SECRET}` with your actual service account
#           secret, and `<argo-namespace>` with the value passed to `--argo-namespace` above
kubectl port-forward --namespace='<argo-namespace>' service/${ARGO_WORKFLOWS_SERVER_SERVICE_NAME} 2746:2746
echo "Bearer $(kubectl get --namespace='<argo-namespace>' secret ${ARGO_WORKFLOWS_READER_SERVICE_ACCOUNT_SECRET} -o=jsonpath='{.data.token}' | base64 --decode)"

About

(M)y End-to-End (ELT) with K8s, Argo Workflows, Airbyte, and DBT

Resources

License

Stars

Watchers

Forks

Languages