diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md new file mode 100644 index 0000000000..cfb63ce808 --- /dev/null +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md @@ -0,0 +1,116 @@ +--- +title: Deploy with Kestra +description: How to deploy a pipeline with Kestra +keywords: [how to, deploy a pipeline, Kestra] +--- + +# Deploy with Kestra + +## Introduction to Kestra + +[Kestra](https://kestra.io/docs) is an open-source, scalable orchestration platform that enables +engineers to manage business-critical workflows declaratively in code. By applying  +infrastructure as code best practices to data, process, and microservice orchestration, you +can build and manage reliable workflows. + +Kestra facilitates reliable workflow management, offering advanced settings for resiliency, +triggers, real-time monitoring, and integration capabilities, making it a valuable tool for data +engineers and developers. + +### Kestra features + +Kestra provides a robust orchestration engine with features including: + +- Workflows accessible through a user interface, event-driven + automation, and an embedded visual studio code editor. +- It also offers embedded documentation, a live-updating topology view, and access to over 400 + plugins, enhancing its versatility. +- Kestra supports Git & CI/CD integrations, basic authentication, and benefits from community + support. + +To know more, please refer to [Kestra's documentation.](https://kestra.io/docs) + +## Building Data Pipelines with `dlt` + +**`dlt`** is an open-source Python library that allows you to declaratively load data sources +into well-structured tables or datasets. It does this through automatic schema inference and evolution. +The library simplifies building data pipeline by providing functionality to support the entire extract +and load process. + +### How does `dlt` integrate with Kestra for pipeline orchestration? + +To illustrate setting up a pipeline in Kestra, we’ll be using the following example: +[From Inbox to Insights AI-Enhanced Email Analysis with dlt and Kestra.](https://kestra.io/blogs/2023-12-04-dlt-kestra-usage) + +The example demonstrates automating a workflow to load data from Gmail to BigQuery using the `dlt`, +complemented by AI-driven summarization and sentiment analysis. You can refer to the project's +github repo by clicking [here.](https://github.com/dlt-hub/dlt-kestra-demo) + +:::info +For the detailed guide, please take a look at the project's [README](https://github.com/dlt-hub/dlt-kestra-demo/blob/main/README.md) section. +::: + +Here is the summary of the steps: + +1. Start by creating a virtual environment. + +1. Generate an `.env` File: Inside your project repository, create an `.env` file to store + credentials in "base64" format, prefixed with 'SECRET\_' for compatibility with Kestra's `secret()` + function. + +1. As per Kestra’s recommendation, install the docker desktop on your machine. + +1. Ensure Docker is running, then download the Docker compose file with: + + ```sh + curl -o docker-compose.yml \ + https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml + ``` + +1. Configure Docker compose file: + Edit the downloaded Docker compose file to link the `.env` file for environment + variables. + + ```yaml + kestra: + image: kestra/kestra:develop-full + env_file: + - .env + ``` + +1. Enable Auto-Restart: In your `docker-compose.yml`, set `restart: always` for both postgres and + kestra services to ensure they reboot automatically after a system restart. + +1. Launch Kestra Server: Execute `docker compose up -d` to start the server. + +1. Access Kestra UI: Navigate to `http://localhost:8080/` to use the Kestra user interface. + +1. Create and Configure Flows: + + - Go to 'Flows', then 'Create'. + - Configure the flow files in the editor. + - Save your flows. + +1. **Understand Flow Components**: + + - Each flow must have an `id`, `namespace`, and a list of `tasks` with their respective `id` and + `type`. + - The main flow orchestrates tasks like loading data from a source to a destination. + +By following these steps, you establish a structured workflow within Kestra, leveraging its powerful +features for efficient data pipeline orchestration. + +:::info +For detailed information on these steps, please consult the `README.md` in the +[dlt-kestra-demo](https://github.com/dlt-hub/dlt-kestra-demo/blob/main/README.md) repo. +::: + +### Additional Resources + +- Ingest Zendesk data into Weaviate using `dlt` with Kestra: + [here](https://kestra.io/blueprints/148-ingest-zendesk-data-into-weaviate-using-dlt). +- Ingest Zendesk data into DuckDb using dlt with Kestra: + [here.](https://kestra.io/blueprints/147-ingest-zendesk-data-into-duckdb-using-dlt) +- Ingest Pipedrive CRM data to BigQuery using `dlt` and schedule it to run every hour: + [here.](https://kestra.io/blueprints/146-ingest-pipedrive-crm-data-to-bigquery-using-dlt-and-schedule-it-to-run-every-hour) + diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js index 275c1f438a..4fd6bfca6b 100644 --- a/docs/website/sidebars.js +++ b/docs/website/sidebars.js @@ -218,6 +218,7 @@ const sidebars = { 'reference/explainers/airflow-gcp-cloud-composer', 'walkthroughs/deploy-a-pipeline/deploy-with-google-cloud-functions', 'walkthroughs/deploy-a-pipeline/deploy-gcp-cloud-function-as-webhook', + 'walkthroughs/deploy-a-pipeline/deploy-with-kestra', ] }, {