From 5f36b2a01046aa96e44ae3619ff9f5a40c833f86 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Wed, 13 Mar 2024 06:06:12 +0000 Subject: [PATCH 1/6] Updated --- .../deploy-a-pipeline/deploy-with-kestra.md | 123 ++++++++++++++++++ docs/website/sidebars.js | 1 + 2 files changed, 124 insertions(+) create mode 100644 docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md new file mode 100644 index 0000000000..ed3604ad65 --- /dev/null +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md @@ -0,0 +1,123 @@ +--- +title: Deploy with Kestra +description: How to deploy a pipeline with Kestra +keywords: [how to, deploy a pipeline, Kestra] +--- + +# Deploy with Kestra + +## Introduction to Kestra + +[Kestra](https://kestra.io/docs) is an open-source, **scalable orchestration platform** that enables +all engineers to manage **business-critical workflows** declaratively in code. By +applying Infrastructure as Code best practices to data, process, and microservice orchestration, you +can build reliable workflows and manage them. + +Kestra facilitates reliable workflow management, offering advanced settings for resiliency, +triggers, real-time monitoring, and integration capabilities, making it a valuable tool for data +engineers and developers. + +### Kestra features + +Kestra, as an open-source platform, provides a robust orchestration engine with features including: + +- Declarative workflows are accessible as code and through a user interface, event-driven + automation, and an embedded Visual Studio code editor. +- It also offers embedded documentation, a live-updating topology view, and access to over 400 + plugins, enhancing its versatility. +- Kestra supports Git & CI/CD integrations, basic authentication, and benefits from community + support. + +To know more, please refer to [Kestra's documentation.](https://kestra.io/pricing) + +## Building Data Pipelines with `dlt` + +**`dlt`** is an open-source Python library that allows you to declaratively **load** data sources +into well-structured tables or datasets through automatic schema inference and evolution. It +simplifies building data pipelines by providing functionality to support the entire extract and load +process. + +### How does `dlt` integrate with Kestra for pipeline orchestration? + +To illustrate setting up a pipeline in Kestra, we’ll be using +[this example.](https://kestra.io/blogs/2023-12-04-dlt-kestra-usage) + +It demonstrates automating a workflow to load data from Gmail to BigQuery using the `dlt`, +complemented by AI-driven summarization and sentiment analysis. You can refer to the project's +github repo here: [Github repo.](https://github.com/dlt-hub/dlt-kestra-demo) + +:::info +For the detailed guide, refer to the project's README section for project setup. +::: + +Here is the summary of the steps: + +1. Start by creating a virtual environment. + +1. Generate an `.env` File\*\*: Inside your project repository, create an `.env` file to store + credentials in base64 format, prefixed with 'SECRET\_' for compatibility with Kestra's `secret()` + function. + +1. As per Kestra’s recommendation, install the docker desktop on your machine. + +1. Download Docker Compose File: Ensure Docker is running, then download the Docker Compose file + with: + + ```python + curl -o docker-compose.yml \ + https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml + ``` + +1. Configure Docker Compose File: Edit the downloaded Docker Compose file to link the `.env` file + for environment variables. + + ```python + kestra: + image: kestra/kestra:develop-full + env_file: + - .env + ``` + +1. Enable Auto-Restart: In your `docker-compose.yml`, set `restart: always` for both postgres and + kestra services to ensure they reboot automatically after a system restart. + +1. Launch Kestra Server: Execute `docker compose up -d` to start the server. + +1. Access Kestra UI: Navigate to `http://localhost:8080/` to use the Kestra user interface. + +1. Create and Configure Flows: + + - Go to 'Flows', then 'Create'. + - Configure the flow files in the editor. + - Save your flows. + +1. **Understand Flow Components**: + + - Each flow must have an `id`, `namespace`, and a list of `tasks` with their respective `id` and + `type`. + - The main flow orchestrates tasks like loading data from a source to a destination. + +By following these steps, you establish a structured workflow within Kestra, leveraging its powerful +features for efficient data pipeline orchestration. + +### Additional Resources + +- Ingest Zendesk data into Weaviate using dlt with Kestra: + [here](https://kestra.io/blueprints/148-ingest-zendesk-data-into-weaviate-using-dlt). +- Ingest Zendesk data into DuckDb using dlt with Kestra: + [here.](https://kestra.io/blueprints/147-ingest-zendesk-data-into-duckdb-using-dlt) +- Ingest Pipedrive CRM data to BigQuery using dlt and schedule it to run every hour: + [here.](https://kestra.io/blueprints/146-ingest-pipedrive-crm-data-to-bigquery-using-dlt-and-schedule-it-to-run-every-hour) + +## Conclusion + +Deploying `dlt` on Kestra streamlines data workflow management by automating and simplifying data +loading processes. This integration offers developers and data engineers a robust framework for +scalable, resilient, and manageable data pipelines. By following the outlined steps, users can use +the orchestration capabilities of Kestra and the intuitive data pipeline construction offered by +`dlt`. + +We encourage data engineers and developers to explore the capabilities of `dlt` within the Kestra +platform. In embracing Kestra and `dlt`, you gain access to a community-driven ecosystem that +encourages innovation and collaboration. Using `dlt` on Kestra streamlines the pipeline development +process and unlocks the potential making better data ingestion pipelines. diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js index 821a1affad..9145d242ae 100644 --- a/docs/website/sidebars.js +++ b/docs/website/sidebars.js @@ -214,6 +214,7 @@ const sidebars = { 'reference/explainers/airflow-gcp-cloud-composer', 'walkthroughs/deploy-a-pipeline/deploy-with-google-cloud-functions', 'walkthroughs/deploy-a-pipeline/deploy-gcp-cloud-function-as-webhook', + 'walkthroughs/deploy-a-pipeline/deploy-with-kestra', ] }, { From 7fe1f5bc9a4a4e27b0f43c2e0b7a6bbd24dd52e0 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Wed, 13 Mar 2024 06:14:38 +0000 Subject: [PATCH 2/6] Updated --- .../deploy-a-pipeline/deploy-with-kestra.md | 23 ++++--------------- 1 file changed, 5 insertions(+), 18 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md index ed3604ad65..10c3f8c818 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md @@ -47,28 +47,27 @@ complemented by AI-driven summarization and sentiment analysis. You can refer to github repo here: [Github repo.](https://github.com/dlt-hub/dlt-kestra-demo) :::info -For the detailed guide, refer to the project's README section for project setup. +For the detailed guide, refer to the project's README section. ::: Here is the summary of the steps: 1. Start by creating a virtual environment. -1. Generate an `.env` File\*\*: Inside your project repository, create an `.env` file to store - credentials in base64 format, prefixed with 'SECRET\_' for compatibility with Kestra's `secret()` +1. Generate an `.env` File: Inside your project repository, create an `.env` file to store + credentials in "base64" format, prefixed with 'SECRET\_' for compatibility with Kestra's `secret()` function. 1. As per Kestra’s recommendation, install the docker desktop on your machine. -1. Download Docker Compose File: Ensure Docker is running, then download the Docker Compose file - with: +1. Ensure Docker is running, then download the Docker compose file with: ```python curl -o docker-compose.yml \ https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml ``` -1. Configure Docker Compose File: Edit the downloaded Docker Compose file to link the `.env` file +1. Configure Docker compose File: Edit the downloaded Docker compose file to link the `.env` file for environment variables. ```python @@ -109,15 +108,3 @@ features for efficient data pipeline orchestration. - Ingest Pipedrive CRM data to BigQuery using dlt and schedule it to run every hour: [here.](https://kestra.io/blueprints/146-ingest-pipedrive-crm-data-to-bigquery-using-dlt-and-schedule-it-to-run-every-hour) -## Conclusion - -Deploying `dlt` on Kestra streamlines data workflow management by automating and simplifying data -loading processes. This integration offers developers and data engineers a robust framework for -scalable, resilient, and manageable data pipelines. By following the outlined steps, users can use -the orchestration capabilities of Kestra and the intuitive data pipeline construction offered by -`dlt`. - -We encourage data engineers and developers to explore the capabilities of `dlt` within the Kestra -platform. In embracing Kestra and `dlt`, you gain access to a community-driven ecosystem that -encourages innovation and collaboration. Using `dlt` on Kestra streamlines the pipeline development -process and unlocks the potential making better data ingestion pipelines. From efb4511ace155fbee0036ccd24b1e2de508666de Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Thu, 14 Mar 2024 06:17:28 +0000 Subject: [PATCH 3/6] Updated --- .../deploy-a-pipeline/deploy-with-kestra.md | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md index 10c3f8c818..ede2755721 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md @@ -8,8 +8,8 @@ keywords: [how to, deploy a pipeline, Kestra] ## Introduction to Kestra -[Kestra](https://kestra.io/docs) is an open-source, **scalable orchestration platform** that enables -all engineers to manage **business-critical workflows** declaratively in code. By +[Kestra](https://kestra.io/docs) is an open-source, scalable orchestration platform that enables +all engineers to manage business-critical workflows declaratively in code. By applying Infrastructure as Code best practices to data, process, and microservice orchestration, you can build reliable workflows and manage them. @@ -19,35 +19,35 @@ engineers and developers. ### Kestra features -Kestra, as an open-source platform, provides a robust orchestration engine with features including: +Kestra provides a robust orchestration engine with features including: -- Declarative workflows are accessible as code and through a user interface, event-driven +- Workflows are accessible through a user interface, event-driven automation, and an embedded Visual Studio code editor. - It also offers embedded documentation, a live-updating topology view, and access to over 400 plugins, enhancing its versatility. - Kestra supports Git & CI/CD integrations, basic authentication, and benefits from community support. -To know more, please refer to [Kestra's documentation.](https://kestra.io/pricing) +To know more, please refer to [Kestra's documentation.](https://kestra.io/docs) ## Building Data Pipelines with `dlt` -**`dlt`** is an open-source Python library that allows you to declaratively **load** data sources +**`dlt`** is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets through automatic schema inference and evolution. It simplifies building data pipelines by providing functionality to support the entire extract and load process. ### How does `dlt` integrate with Kestra for pipeline orchestration? -To illustrate setting up a pipeline in Kestra, we’ll be using -[this example.](https://kestra.io/blogs/2023-12-04-dlt-kestra-usage) +To illustrate setting up a pipeline in Kestra, we’ll be using the following example: +[From Inbox to Insights AI-Enhanced Email Analysis with dlt and Kestra.](https://kestra.io/blogs/2023-12-04-dlt-kestra-usage) It demonstrates automating a workflow to load data from Gmail to BigQuery using the `dlt`, complemented by AI-driven summarization and sentiment analysis. You can refer to the project's -github repo here: [Github repo.](https://github.com/dlt-hub/dlt-kestra-demo) +github repo by clicking [here.](https://github.com/dlt-hub/dlt-kestra-demo) :::info -For the detailed guide, refer to the project's README section. +For the detailed guide, please take a look at the project's [README](https://github.com/dlt-hub/dlt-kestra-demo/blob/main/README.md) section. ::: Here is the summary of the steps: @@ -101,10 +101,10 @@ features for efficient data pipeline orchestration. ### Additional Resources -- Ingest Zendesk data into Weaviate using dlt with Kestra: +- Ingest Zendesk data into Weaviate using `dlt` with Kestra: [here](https://kestra.io/blueprints/148-ingest-zendesk-data-into-weaviate-using-dlt). - Ingest Zendesk data into DuckDb using dlt with Kestra: [here.](https://kestra.io/blueprints/147-ingest-zendesk-data-into-duckdb-using-dlt) -- Ingest Pipedrive CRM data to BigQuery using dlt and schedule it to run every hour: +- Ingest Pipedrive CRM data to BigQuery using `dlt` and schedule it to run every hour: [here.](https://kestra.io/blueprints/146-ingest-pipedrive-crm-data-to-bigquery-using-dlt-and-schedule-it-to-run-every-hour) From f7f7033e8db451f387aea575f3cc0e72740b34ec Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Mon, 18 Mar 2024 05:45:03 +0000 Subject: [PATCH 4/6] Updated --- .../deploy-a-pipeline/deploy-with-kestra.md | 20 ++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md index ede2755721..6f938c28d4 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md @@ -9,8 +9,8 @@ keywords: [how to, deploy a pipeline, Kestra] ## Introduction to Kestra [Kestra](https://kestra.io/docs) is an open-source, scalable orchestration platform that enables -all engineers to manage business-critical workflows declaratively in code. By -applying Infrastructure as Code best practices to data, process, and microservice orchestration, you +all engineers to manage business-critical workflows declaratively in code. By applying  +infrastructure as code best practices to data, process, and microservice orchestration, you can build reliable workflows and manage them. Kestra facilitates reliable workflow management, offering advanced settings for resiliency, @@ -22,7 +22,7 @@ engineers and developers. Kestra provides a robust orchestration engine with features including: - Workflows are accessible through a user interface, event-driven - automation, and an embedded Visual Studio code editor. + automation, and an embedded visual studio code editor. - It also offers embedded documentation, a live-updating topology view, and access to over 400 plugins, enhancing its versatility. - Kestra supports Git & CI/CD integrations, basic authentication, and benefits from community @@ -62,15 +62,16 @@ Here is the summary of the steps: 1. Ensure Docker is running, then download the Docker compose file with: - ```python + ```shell curl -o docker-compose.yml \ https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml ``` -1. Configure Docker compose File: Edit the downloaded Docker compose file to link the `.env` file - for environment variables. +1. Configure Docker compose file: + Edit the downloaded Docker compose file to link the `.env` file for environment + variables. - ```python + ```yaml kestra: image: kestra/kestra:develop-full env_file: @@ -99,6 +100,11 @@ Here is the summary of the steps: By following these steps, you establish a structured workflow within Kestra, leveraging its powerful features for efficient data pipeline orchestration. +:::info +For detailed information on these steps, please consult the `README.md` in the +[dlt-kestra-demo](https://github.com/dlt-hub/dlt-kestra-demo/blob/main/README.md) repo. +::: + ### Additional Resources - Ingest Zendesk data into Weaviate using `dlt` with Kestra: From 1fb77733a127f20d882c325e9bef639584dd433f Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Thu, 21 Mar 2024 11:38:50 +0000 Subject: [PATCH 5/6] Updated --- .../deploy-a-pipeline/deploy-with-kestra.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md index 6f938c28d4..09ddea6ac8 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md @@ -9,9 +9,9 @@ keywords: [how to, deploy a pipeline, Kestra] ## Introduction to Kestra [Kestra](https://kestra.io/docs) is an open-source, scalable orchestration platform that enables -all engineers to manage business-critical workflows declaratively in code. By applying  +engineers to manage business-critical workflows declaratively in code. By applying  infrastructure as code best practices to data, process, and microservice orchestration, you -can build reliable workflows and manage them. +can build and manage reliable workflows. Kestra facilitates reliable workflow management, offering advanced settings for resiliency, triggers, real-time monitoring, and integration capabilities, making it a valuable tool for data @@ -21,7 +21,7 @@ engineers and developers. Kestra provides a robust orchestration engine with features including: -- Workflows are accessible through a user interface, event-driven +- Workflows accessible through a user interface, event-driven automation, and an embedded visual studio code editor. - It also offers embedded documentation, a live-updating topology view, and access to over 400 plugins, enhancing its versatility. @@ -32,17 +32,17 @@ To know more, please refer to [Kestra's documentation.](https://kestra.io/docs) ## Building Data Pipelines with `dlt` -**`dlt`** is an open-source Python library that allows you to declaratively load data sources -into well-structured tables or datasets through automatic schema inference and evolution. It -simplifies building data pipelines by providing functionality to support the entire extract and load -process. +**`dlt`** is an open-source Python library that allows you to declaratively load data sources +into well-structured tables or datasets. It does this through automatic schema inference and evolution. +The library simplifies building data pipeline by providing functionality to support the entire extract +and load process. ### How does `dlt` integrate with Kestra for pipeline orchestration? To illustrate setting up a pipeline in Kestra, we’ll be using the following example: [From Inbox to Insights AI-Enhanced Email Analysis with dlt and Kestra.](https://kestra.io/blogs/2023-12-04-dlt-kestra-usage) -It demonstrates automating a workflow to load data from Gmail to BigQuery using the `dlt`, +The example demonstrates automating a workflow to load data from Gmail to BigQuery using the `dlt`, complemented by AI-driven summarization and sentiment analysis. You can refer to the project's github repo by clicking [here.](https://github.com/dlt-hub/dlt-kestra-demo) From 5302b2d1f7c400d89f399e63e8e5932712548324 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Fri, 22 Mar 2024 05:59:57 +0000 Subject: [PATCH 6/6] Updated --- .../docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md index 09ddea6ac8..cfb63ce808 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-kestra.md @@ -62,7 +62,7 @@ Here is the summary of the steps: 1. Ensure Docker is running, then download the Docker compose file with: - ```shell + ```sh curl -o docker-compose.yml \ https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml ```