Merge pull request #1087 from dlt-hub/docs/deploy-dlt-with-kestra

Docs: deploy with Kestra
dlt-hub · Mar 25, 2024 · d5aab24 · d5aab24
2 parents e0774cc + 5302b2d
commit d5aab24
Show file tree

Hide file tree

Showing 2 changed files with 117 additions and 0 deletions.
diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md
@@ -0,0 +1,116 @@
+---
+title: Deploy with Kestra
+description: How to deploy a pipeline with Kestra
+keywords: [how to, deploy a pipeline, Kestra]
+---
+
+# Deploy with Kestra
+
+## Introduction to Kestra
+
+[Kestra](https://kestra.io/docs) is an open-source, scalable orchestration platform that enables
+engineers to manage business-critical workflows declaratively in code. By applying 
+infrastructure as code best practices to data, process, and microservice orchestration, you
+can build and manage reliable workflows.
+
+Kestra facilitates reliable workflow management, offering advanced settings for resiliency,
+triggers, real-time monitoring, and integration capabilities, making it a valuable tool for data
+engineers and developers.
+
+### Kestra features
+
+Kestra provides a robust orchestration engine with features including:
+
+- Workflows accessible through a user interface, event-driven
+  automation, and an embedded visual studio code editor.
+- It also offers embedded documentation, a live-updating topology view, and access to over 400
+  plugins, enhancing its versatility.
+- Kestra supports Git & CI/CD integrations, basic authentication, and benefits from community
+  support.
+
+To know more, please refer to [Kestra's documentation.](https://kestra.io/docs)
+
+## Building Data Pipelines with `dlt`
+
+**`dlt`** is an open-source Python library that allows you to declaratively load data sources
+into well-structured tables or datasets. It does this through automatic schema inference and evolution.
+The library simplifies building data pipeline by providing functionality to support the entire extract 
+and load process.
+
+### How does `dlt` integrate with Kestra for pipeline orchestration?
+
+To illustrate setting up a pipeline in Kestra, we’ll be using the following example: 
+[From Inbox to Insights AI-Enhanced Email Analysis with dlt and Kestra.](https://kestra.io/blogs/2023-12-04-dlt-kestra-usage)
+
+The example demonstrates automating a workflow to load data from Gmail to BigQuery using the `dlt`,
+complemented by AI-driven summarization and sentiment analysis. You can refer to the project's
+github repo by clicking [here.](https://github.com/dlt-hub/dlt-kestra-demo)
+
+:::info 
+For the detailed guide, please take a look at the project's [README](https://github.com/dlt-hub/dlt-kestra-demo/blob/main/README.md) section. 
+:::
+
+Here is the summary of the steps:
+
+1. Start by creating a virtual environment.
+
+1. Generate an `.env` File: Inside your project repository, create an `.env` file to store
+   credentials in "base64" format, prefixed with 'SECRET\_' for compatibility with Kestra's `secret()`
+   function.
+
+1. As per Kestra’s recommendation, install the docker desktop on your machine.
+
+1. Ensure Docker is running, then download the Docker compose file with:
+
+   ```sh
+    curl -o docker-compose.yml \
+    https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml
+   ```
+
+1. Configure Docker compose file: 
+   Edit the downloaded Docker compose file to link the `.env` file for environment 
+   variables.
+
+   ```yaml
+   kestra:
+       image: kestra/kestra:develop-full
+       env_file:
+           - .env
+   ```
+
+1. Enable Auto-Restart: In your `docker-compose.yml`, set `restart: always` for both postgres and
+   kestra services to ensure they reboot automatically after a system restart.
+
+1. Launch Kestra Server: Execute `docker compose up -d` to start the server.
+
+1. Access Kestra UI: Navigate to `http://localhost:8080/` to use the Kestra user interface.
+
+1. Create and Configure Flows:
+
+   - Go to 'Flows', then 'Create'.
+   - Configure the flow files in the editor.
+   - Save your flows.
+
+1. **Understand Flow Components**:
+
+   - Each flow must have an `id`, `namespace`, and a list of `tasks` with their respective `id` and
+     `type`.
+   - The main flow orchestrates tasks like loading data from a source to a destination.
+
+By following these steps, you establish a structured workflow within Kestra, leveraging its powerful
+features for efficient data pipeline orchestration.
+
+:::info
+For detailed information on these steps, please consult the `README.md` in the 
+[dlt-kestra-demo](https://github.com/dlt-hub/dlt-kestra-demo/blob/main/README.md) repo.
+:::
+
+### Additional Resources
+
+- Ingest Zendesk data into Weaviate using `dlt` with Kestra:
+  [here](https://kestra.io/blueprints/148-ingest-zendesk-data-into-weaviate-using-dlt).
+- Ingest Zendesk data into DuckDb using dlt with Kestra:
+  [here.](https://kestra.io/blueprints/147-ingest-zendesk-data-into-duckdb-using-dlt)
+- Ingest Pipedrive CRM data to BigQuery using `dlt` and schedule it to run every hour:
+  [here.](https://kestra.io/blueprints/146-ingest-pipedrive-crm-data-to-bigquery-using-dlt-and-schedule-it-to-run-every-hour)
+
diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js
@@ -218,6 +218,7 @@ const sidebars = {
             'reference/explainers/airflow-gcp-cloud-composer',
             'walkthroughs/deploy-a-pipeline/deploy-with-google-cloud-functions',
             'walkthroughs/deploy-a-pipeline/deploy-gcp-cloud-function-as-webhook',
+            'walkthroughs/deploy-a-pipeline/deploy-with-kestra',
           ]
         },
         {