Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add demo Colab notebook using magic CLI commands for setup and execution #1860

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
263 changes: 263 additions & 0 deletions docs/reference/colab-demo.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8065978e",
"metadata": {},
"source": [
"# [Demo]: Google Colab Notebook 🚀\n",
"\n",
"This notebook demonstrates how to manage and deploy data pipelines using the DLT (Data Loading Tool) CLI, directly within an IPython notebook via magic commands. We’ll recreate the pipeline setup from [Colab Demo](https://colab.research.google.com/drive/1NfSB1DpwbbHX9_t5vlalBTf13utwpMGx?usp=sharing#scrollTo=GYREioraz1m6), using %pipeline, %init, %schema, and other magic commands.\n",
"\n",
"### **What you'll learn:**\n",
"\n",
"1. How to list, sync, and manage pipelines using `%pipeline`.\n",
"2. How to initialize a new DLT pipeline with `%init`.\n",
"3. How to manage schemas using `%schema`.\n",
"4. How to check DLT version with `%dlt_version`.\n",
"\n",
"Let's dive in!\n",
"\n",
"# 1. **Setup Environment**\n",
"\n",
"First, you need to install the required DLT tool if it's not already installed."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33e54cbc",
"metadata": {},
"outputs": [],
"source": [
"# Install the dlt package with duckdb dependency\n",
"!pip install \"dlt[duckdb]\""
]
},
{
"cell_type": "markdown",
"id": "3bc07810",
"metadata": {},
"source": [
"# 2. **Initialize a Pipeline**\n",
"\n",
"You can initialize a new DLT pipeline by specifying the source and destination. This will generate the necessary scripts for data loading.\n",
"\n",
"### Initialize a New Pipeline\n",
"\n",
"In this example, we’ll initialize a pipeline from a `pokemon` source to a `duckdb` destination."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "057d4613",
"metadata": {},
"outputs": [],
"source": [
"# Initialize a pipeline with a source and destination\n",
"%init --source_name pokemon --destination_name duckdb"
]
},
{
"cell_type": "markdown",
"id": "d31dde78",
"metadata": {},
"source": [
"# 3. **Sync a Pipeline**\n",
"\n",
"After initializing a pipeline, you can run a sync operation to load data from the source to the destination.\n",
"\n",
"### Sync the Pipeline\n",
"\n",
"Use the `sync` operation to load data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de982c19",
"metadata": {},
"outputs": [],
"source": [
"%pipeline --operation sync --pipeline_name pokemon_duckdb"
]
},
{
"cell_type": "markdown",
"id": "b765e8eb",
"metadata": {},
"source": [
"# 4. **Manage Pipelines**\n",
"\n",
"You can list all available pipelines using the %pipeline magic command with the list-pipelines operation.\n",
"\n",
"### **List Available Pipelines**\n",
"\n",
"You can see all available pipelines by running the following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e8c5c51",
"metadata": {},
"outputs": [],
"source": [
"# Magic command to list pipelines\n",
"%pipeline --operation list-pipelines"
]
},
{
"cell_type": "markdown",
"id": "8ca56adc",
"metadata": {},
"source": [
"### **Pipeline Information**\n",
"\n",
"To get detailed information on a specific pipeline, use the info operation, specifying the pipeline name."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4d3b9611",
"metadata": {},
"outputs": [],
"source": [
"%pipeline --operation info --pipeline_name pokemon"
]
},
{
"cell_type": "markdown",
"id": "7a6262bb",
"metadata": {},
"source": [
"# 5. **Managing Schemas**\n",
"\n",
"You can inspect, convert, or upgrade the schema used in the pipeline by specifying a schema file path.\n",
"\n",
"### Manage Schema\n",
"\n",
"To show the schema in JSON format:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a681652",
"metadata": {},
"outputs": [],
"source": [
"# Replace <schema_file_path> with the actual schema file path\n",
"%schema --file_path <schema_file_path> --format json"
]
},
{
"cell_type": "markdown",
"id": "6c0e564a",
"metadata": {},
"source": [
"# 6. **Check DLT Version**\n",
"\n",
"It's always good practice to check the version of the DLT tool in use.\n",
"\n",
"### Check DLT Version\n",
"\n",
"Ensure that you’re using the latest version of DLT."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c03a47e2",
"metadata": {},
"outputs": [],
"source": [
"# Check DLT version\n",
"%dlt_version"
]
},
{
"cell_type": "markdown",
"id": "6674c53b",
"metadata": {},
"source": [
"# 7. **Enable/Disable Telemetry**\n",
"\n",
"Control telemetry settings for your DLT operations.\n",
"\n",
"### Manage Telemetry\n",
"\n",
"You can enable or disable telemetry globally."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6ea6a2a",
"metadata": {},
"outputs": [],
"source": [
"# Enable telemetry\n",
"%settings --enable-telemetry\n",
"\n",
"# Disable telemetry\n",
"%settings --disable-telemetry"
]
},
{
"cell_type": "markdown",
"id": "a3ac8da7",
"metadata": {},
"source": [
"# 8. **Additional Operations**\n",
"\n",
"You can explore other DLT pipeline operations like trace, failed-jobs, and drop-pending-packages.\n",
"\n",
"### Explore More Pipeline Operations\n",
"\n",
"Check out these additional operations for pipeline management."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f8ea828",
"metadata": {},
"outputs": [],
"source": [
"# Trace pipeline execution\n",
"%pipeline --operation trace --pipeline_name pokemon\n",
"\n",
"# Check for failed jobs\n",
"%pipeline --operation failed-jobs --pipeline_name pokemon\n",
"\n",
"# Drop pending packages\n",
"%pipeline --operation drop-pending-packages --pipeline_name pokemon"
]
},
{
"cell_type": "markdown",
"id": "fff4dbee",
"metadata": {},
"source": [
"## 🎉 **Finish!** _🎉_\n",
"\n",
"By using the magic commands %pipeline, %init, %schema, and others, we've streamlined the DLT pipeline management process within a Colab notebook."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}