From 324a390c999acee9090e27478c6bf93c16a0555b Mon Sep 17 00:00:00 2001 From: Dave Date: Tue, 17 Sep 2024 15:46:12 +0200 Subject: [PATCH] search and replace absolute links --- .../docs/dlt-ecosystem/destinations/bigquery.md | 2 +- .../docs/dlt-ecosystem/destinations/clickhouse.md | 2 +- .../docs/dlt-ecosystem/destinations/filesystem.md | 2 +- .../docs/dlt-ecosystem/destinations/snowflake.md | 2 +- .../docs/dlt-ecosystem/destinations/synapse.md | 2 +- .../dlt-ecosystem/transformations/dbt/dbt_cloud.md | 2 +- .../verified-sources/amazon_kinesis.md | 2 +- .../verified-sources/google_analytics.md | 4 ++-- .../verified-sources/google_sheets.md | 4 ++-- .../docs/dlt-ecosystem/verified-sources/jira.md | 2 +- .../docs/dlt-ecosystem/verified-sources/matomo.md | 6 +++--- .../verified-sources/pg_replication.md | 14 +++++++------- .../verified-sources/sql_database/advanced.md | 8 ++++---- .../verified-sources/sql_database/configuration.md | 2 +- .../verified-sources/sql_database/usage.md | 6 +++--- .../dlt-ecosystem/verified-sources/workable.md | 8 ++++---- .../data-enrichments/url-parser-data-enrichment.md | 2 +- .../user_agent_device_data_enrichment.md | 6 +++--- .../docs/general-usage/destination-tables.md | 2 +- .../website/docs/general-usage/schema-evolution.md | 10 +++++----- docs/website/docs/reference/telemetry.md | 2 +- docs/website/docs/tutorial/sql-database.md | 2 +- .../website/docs/walkthroughs/create-a-pipeline.md | 2 +- .../deploy-a-pipeline/deploy-with-prefect.md | 2 +- docs/website/docs/walkthroughs/zendesk-weaviate.md | 2 +- 25 files changed, 49 insertions(+), 49 deletions(-) diff --git a/docs/website/docs/dlt-ecosystem/destinations/bigquery.md b/docs/website/docs/dlt-ecosystem/destinations/bigquery.md index 324c712dfc..0984d3d178 100644 --- a/docs/website/docs/dlt-ecosystem/destinations/bigquery.md +++ b/docs/website/docs/dlt-ecosystem/destinations/bigquery.md @@ -220,7 +220,7 @@ When staging is enabled: ## Supported Column Hints -BigQuery supports the following [column hints](https://dlthub.com/docs/general-usage/schema#tables-and-columns): +BigQuery supports the following [column hints](/general-usage/schema#tables-and-columns): * `partition` - creates a partition with a day granularity on the decorated column (`PARTITION BY DATE`). May be used with `datetime`, `date`, and `bigint` data types. diff --git a/docs/website/docs/dlt-ecosystem/destinations/clickhouse.md b/docs/website/docs/dlt-ecosystem/destinations/clickhouse.md index 8752c571b1..d0047cd717 100644 --- a/docs/website/docs/dlt-ecosystem/destinations/clickhouse.md +++ b/docs/website/docs/dlt-ecosystem/destinations/clickhouse.md @@ -220,7 +220,7 @@ To enable this, GCS provides an S3 compatibility mode that emulates the S3 API, allowing ClickHouse to access GCS buckets via its S3 integration. For detailed instructions on setting up S3-compatible storage with dlt, including AWS S3, MinIO, and Cloudflare R2, refer to -the [dlt documentation on filesystem destinations](https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#using-s3-compatible-storage). +the [dlt documentation on filesystem destinations](/dlt-ecosystem/destinations/filesystem#using-s3-compatible-storage). To set up GCS staging with HMAC authentication in dlt: diff --git a/docs/website/docs/dlt-ecosystem/destinations/filesystem.md b/docs/website/docs/dlt-ecosystem/destinations/filesystem.md index cfeb03655c..7f1b8bf1c1 100644 --- a/docs/website/docs/dlt-ecosystem/destinations/filesystem.md +++ b/docs/website/docs/dlt-ecosystem/destinations/filesystem.md @@ -414,7 +414,7 @@ disable_compression=true - To decompress a `gzip` file, you can use tools like `gunzip`. This will convert the compressed file back to its original format, making it readable. -For more details on managing file compression, please visit our documentation on performance optimization: [Disabling and Enabling File Compression](https://dlthub.com/docs/reference/performance#disabling-and-enabling-file-compression). +For more details on managing file compression, please visit our documentation on performance optimization: [Disabling and Enabling File Compression](/reference/performance#disabling-and-enabling-file-compression). ## Files layout All the files are stored in a single folder with the name of the dataset that you passed to the `run` or `load` methods of the `pipeline`. In our example chess pipeline, it is **chess_players_games_data**. diff --git a/docs/website/docs/dlt-ecosystem/destinations/snowflake.md b/docs/website/docs/dlt-ecosystem/destinations/snowflake.md index 74688ba7fa..53fb066f06 100644 --- a/docs/website/docs/dlt-ecosystem/destinations/snowflake.md +++ b/docs/website/docs/dlt-ecosystem/destinations/snowflake.md @@ -194,7 +194,7 @@ Which will read, `|` delimited file, without header and will continue on errors. Note that we ignore missing columns `ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE` and we will insert NULL into them. ## Supported column hints -Snowflake supports the following [column hints](https://dlthub.com/docs/general-usage/schema#tables-and-columns): +Snowflake supports the following [column hints](/general-usage/schema#tables-and-columns): * `cluster` - creates a cluster column(s). Many columns per table are supported and only when a new table is created. ## Table and column identifiers diff --git a/docs/website/docs/dlt-ecosystem/destinations/synapse.md b/docs/website/docs/dlt-ecosystem/destinations/synapse.md index 0d50924cdf..39de373b22 100644 --- a/docs/website/docs/dlt-ecosystem/destinations/synapse.md +++ b/docs/website/docs/dlt-ecosystem/destinations/synapse.md @@ -173,7 +173,7 @@ Possible values: ## Supported column hints -Synapse supports the following [column hints](https://dlthub.com/docs/general-usage/schema#tables-and-columns): +Synapse supports the following [column hints](/general-usage/schema#tables-and-columns): * `primary_key` - creates a `PRIMARY KEY NONCLUSTERED NOT ENFORCED` constraint on the column * `unique` - creates a `UNIQUE NOT ENFORCED` constraint on the column diff --git a/docs/website/docs/dlt-ecosystem/transformations/dbt/dbt_cloud.md b/docs/website/docs/dlt-ecosystem/transformations/dbt/dbt_cloud.md index d15c4eb84c..df27795603 100644 --- a/docs/website/docs/dlt-ecosystem/transformations/dbt/dbt_cloud.md +++ b/docs/website/docs/dlt-ecosystem/transformations/dbt/dbt_cloud.md @@ -102,4 +102,4 @@ DBT_CLOUD__ACCOUNT_ID DBT_CLOUD__JOB_ID ``` -For more information, read the [Credentials](https://dlthub.com/docs/general-usage/credentials) documentation. +For more information, read the [Credentials](/general-usage/credentials) documentation. diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/amazon_kinesis.md b/docs/website/docs/dlt-ecosystem/verified-sources/amazon_kinesis.md index 3e7dad9793..7a2f5171c5 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/amazon_kinesis.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/amazon_kinesis.md @@ -15,7 +15,7 @@ streams of data in real time. Our AWS Kinesis [verified source](https://github.com/dlt-hub/verified-sources/tree/master/sources/kinesis) loads messages from Kinesis streams to your preferred -[destination](https://dlthub.com/docs/dlt-ecosystem/destinations/). +[destination](/dlt-ecosystem/destinations/). Resources that can be loaded using this verified source are: diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md index 7b4c1b0d5e..52e45367d1 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md @@ -329,7 +329,7 @@ verified source. ``` > Loads all the data till date in the first run, and then - > [incrementally](https://dlthub.com/docs/general-usage/incremental-loading) in subsequent runs. + > [incrementally](/general-usage/incremental-loading) in subsequent runs. 1. To load data from a specific start date: @@ -340,7 +340,7 @@ verified source. ``` > Loads data starting from the specified date during the first run, and then - > [incrementally](https://dlthub.com/docs/general-usage/incremental-loading) in subsequent runs. + > [incrementally](/general-usage/incremental-loading) in subsequent runs. diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md index 9cd6ad8079..4269f9dce5 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md @@ -441,11 +441,11 @@ dlt.resource( `name`: Denotes the table name, set here as "spreadsheet_info". `write_disposition`: Dictates how data is loaded to the destination. -[Read more](https://dlthub.com/docs/general-usage/incremental-loading#the-3-write-dispositions). +[Read more](/general-usage/incremental-loading#the-3-write-dispositions). `merge_key`: Parameter is used to specify the column used to identify records for merging. In this case,"spreadsheet_id", means that the records will be merged based on the values in this column. -[Read more](https://dlthub.com/docs/general-usage/incremental-loading#merge-incremental_loading). +[Read more](/general-usage/incremental-loading#merge-incremental_loading). ## Customization ### Create your own pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/jira.md b/docs/website/docs/dlt-ecosystem/verified-sources/jira.md index b4e8bb76de..c13a6a75be 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/jira.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/jira.md @@ -190,7 +190,7 @@ above. 1. Configure the pipeline by specifying the pipeline name, destination, and dataset. To read more about pipeline configuration, please refer to our documentation - [here](https://dlthub.com/docs/general-usage/pipeline): + [here](/general-usage/pipeline): ```py pipeline = dlt.pipeline( diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md b/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md index 296526b21a..445f85d2f7 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md @@ -150,7 +150,7 @@ def matomo_reports( `site_id`: Website's Site ID as per Matomo account. ->Note: This is an [incremental](https://dlthub.com/docs/general-usage/incremental-loading) source method and loads the "last_date" from the state of last pipeline run. +>Note: This is an [incremental](/general-usage/incremental-loading) source method and loads the "last_date" from the state of last pipeline run. ### Source `matomo_visits`: @@ -183,7 +183,7 @@ def matomo_visits( `get_live_event_visitors`: Retrieve unique visitor data, defaulting to False. ->Note: This is an [incremental](https://dlthub.com/docs/general-usage/incremental-loading) source method and loads the "last_date" from the state of last pipeline run. +>Note: This is an [incremental](/general-usage/incremental-loading) source method and loads the "last_date" from the state of last pipeline run. ### Resource `get_last_visits` @@ -214,7 +214,7 @@ def get_last_visits( `rows_per_page`: Number of rows on each page. ->Note: This is an [incremental](https://dlthub.com/docs/general-usage/incremental-loading) resource method and loads the "last_date" from the state of last pipeline run. +>Note: This is an [incremental](/general-usage/incremental-loading) resource method and loads the "last_date" from the state of last pipeline run. ### Transformer `visitors` diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/pg_replication.md b/docs/website/docs/dlt-ecosystem/verified-sources/pg_replication.md index a12c831137..54ebc683a0 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/pg_replication.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/pg_replication.md @@ -65,10 +65,10 @@ To get started with your data pipeline, follow these steps: dlt init pg_replication duckdb ``` - It will initialize [the pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/pg_replication_pipeline.py) with a Postgres replication as the [source](https://dlthub.com/docs/general-usage/source) and [DuckDB](https://dlthub.com/docs/dlt-ecosystem/destinations/duckdb) as the [destination](https://dlthub.com/docs/dlt-ecosystem/destinations). + It will initialize [the pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/pg_replication_pipeline.py) with a Postgres replication as the [source](/general-usage/source) and [DuckDB](/dlt-ecosystem/destinations/duckdb) as the [destination](/dlt-ecosystem/destinations). -2. If you'd like to use a different destination, simply replace `duckdb` with the name of your preferred [destination](https://dlthub.com/docs/dlt-ecosystem/destinations). +2. If you'd like to use a different destination, simply replace `duckdb` with the name of your preferred [destination](/dlt-ecosystem/destinations). 3. This source uses `sql_database` source, you can init it as follows: @@ -81,7 +81,7 @@ To get started with your data pipeline, follow these steps: 4. After running these two commands, a new directory will be created with the necessary files and configuration settings to get started. - For more information, read the guide on [how to add a verified source](https://dlthub.com/docs/walkthroughs/add-a-verified-source). + For more information, read the guide on [how to add a verified source](/walkthroughs/add-a-verified-source). :::note You can omit the `[sql.sources.credentials]` section in `secrets.toml` as it is not required. @@ -109,9 +109,9 @@ To get started with your data pipeline, follow these steps: sources.pg_replication.credentials="postgresql://username@password.host:port/database" ``` -3. Finally, follow the instructions in [Destinations](https://dlthub.com/docs/dlt-ecosystem/destinations/) to add credentials for your chosen destination. This will ensure that your data is properly routed. +3. Finally, follow the instructions in [Destinations](/dlt-ecosystem/destinations/) to add credentials for your chosen destination. This will ensure that your data is properly routed. -For more information, read the [Configuration section.](https://dlthub.com/docs/general-usage/credentials) +For more information, read the [Configuration section.](/general-usage/credentials) ## Run the pipeline @@ -130,12 +130,12 @@ For more information, read the [Configuration section.](https://dlthub.com/docs/ For example, the `pipeline_name` for the above pipeline example is `pg_replication_pipeline`, you may also use any custom name instead. - For more information, read the guide on [how to run a pipeline](https://dlthub.com/docs/walkthroughs/run-a-pipeline). + For more information, read the guide on [how to run a pipeline](/walkthroughs/run-a-pipeline). ## Sources and resources -`dlt` works on the principle of [sources](https://dlthub.com/docs/general-usage/source) and [resources](https://dlthub.com/docs/general-usage/resource). +`dlt` works on the principle of [sources](/general-usage/source) and [resources](/general-usage/resource). ### Resource `replication_resource` diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/advanced.md b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/advanced.md index 7ff08f8095..d1c6a4748c 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/advanced.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/advanced.md @@ -14,7 +14,7 @@ import Header from '../_source-info-header.md'; Efficient data management often requires loading only new or updated data from your SQL databases, rather than reprocessing the entire dataset. This is where incremental loading comes into play. -Incremental loading uses a cursor column (e.g., timestamp or auto-incrementing ID) to load only data newer than a specified initial value, enhancing efficiency by reducing processing time and resource use. Read [here](https://dlthub.com/docs/walkthroughs/sql-incremental-configuration) for more details on incremental loading with `dlt`. +Incremental loading uses a cursor column (e.g., timestamp or auto-incrementing ID) to load only data newer than a specified initial value, enhancing efficiency by reducing processing time and resource use. Read [here](/walkthroughs/sql-incremental-configuration) for more details on incremental loading with `dlt`. #### How to configure @@ -46,7 +46,7 @@ certain range. print(info) ``` Behind the scene, the loader generates a SQL query filtering rows with `last_modified` values greater than the incremental value. In the first run, this is the initial value (midnight (00:00:00) January 1, 2024). - In subsequent runs, it is the latest value of `last_modified` that `dlt` stores in [state](https://dlthub.com/docs/general-usage/state). + In subsequent runs, it is the latest value of `last_modified` that `dlt` stores in [state](/general-usage/state). **2. Incremental loading with the source `sql_database`** To achieve the same using the `sql_database` source, you would specify your cursor as follows: @@ -163,9 +163,9 @@ The examples below show how you can set arguments in any of the `.toml` files (` database = sql_database() ``` -You'll be able to configure all the arguments this way (except adapter callback function). [Standard dlt rules apply](https://dlthub.com/docs/general-usage/credentials/configuration#configure-dlt-sources-and-resources). +You'll be able to configure all the arguments this way (except adapter callback function). [Standard dlt rules apply](/general-usage/credentials/configuration#configure-dlt-sources-and-resources). -It is also possible to set these arguments as environment variables [using the proper naming convention](https://dlthub.com/docs/general-usage/credentials/config_providers#toml-vs-environment-variables): +It is also possible to set these arguments as environment variables [using the proper naming convention](/general-usage/credentials/config_providers#toml-vs-environment-variables): ```sh SOURCES__SQL_DATABASE__CREDENTIALS="mssql+pyodbc://loader.database.windows.net/dlt_data?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server" SOURCES__SQL_DATABASE__BACKEND=pandas diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md index 88ea268378..c1f22b988b 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md @@ -124,7 +124,7 @@ There are several options for adding your connection credentials into your `dlt` #### 1. Setting them in `secrets.toml` or as environment variables (Recommended) -You can set up credentials using [any method](https://dlthub.com/docs/devel/general-usage/credentials/setup#available-config-providers) supported by `dlt`. We recommend using `.dlt/secrets.toml` or the environment variables. See Step 2 of the [setup](./setup) for how to set credentials inside `secrets.toml`. For more information on passing credentials read [here](https://dlthub.com/docs/devel/general-usage/credentials/setup). +You can set up credentials using [any method](/devel/general-usage/credentials/setup#available-config-providers) supported by `dlt`. We recommend using `.dlt/secrets.toml` or the environment variables. See Step 2 of the [setup](./setup) for how to set credentials inside `secrets.toml`. For more information on passing credentials read [here](/devel/general-usage/credentials/setup). #### 2. Passing them directly in the script diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/usage.md b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/usage.md index ee70e92ea0..e67e87b508 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/usage.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/usage.md @@ -39,7 +39,7 @@ The PyArrow backend does not yield individual rows rather loads chunks of data a Examples: -1. Pseudonymizing data to hide personally identifiable information (PII) before loading it to the destination. (See [here](https://dlthub.com/docs/general-usage/customising-pipelines/pseudonymizing_columns) for more information on pseudonymizing data with `dlt`) +1. Pseudonymizing data to hide personally identifiable information (PII) before loading it to the destination. (See [here](/general-usage/customising-pipelines/pseudonymizing_columns) for more information on pseudonymizing data with `dlt`) ```py import hashlib @@ -92,11 +92,11 @@ Examples: ## Deploying the sql_database pipeline -You can deploy the `sql_database` pipeline with any of the `dlt` deployment methods, such as [GitHub Actions](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-github-actions), [Airflow](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer), [Dagster](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster) etc. See [here](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline) for a full list of deployment methods. +You can deploy the `sql_database` pipeline with any of the `dlt` deployment methods, such as [GitHub Actions](/walkthroughs/deploy-a-pipeline/deploy-with-github-actions), [Airflow](/walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer), [Dagster](/walkthroughs/deploy-a-pipeline/deploy-with-dagster) etc. See [here](/walkthroughs/deploy-a-pipeline) for a full list of deployment methods. ### Running on Airflow When running on Airflow: -1. Use the `dlt` [Airflow Helper](../../../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer.md#2-modify-dag-file) to create tasks from the `sql_database` source. (If you want to run table extraction in parallel, then you can do this by setting `decompose = "parallel-isolated"` when doing the source->DAG conversion. See [here](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer#2-modify-dag-file) for code example.) +1. Use the `dlt` [Airflow Helper](../../../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer.md#2-modify-dag-file) to create tasks from the `sql_database` source. (If you want to run table extraction in parallel, then you can do this by setting `decompose = "parallel-isolated"` when doing the source->DAG conversion. See [here](/walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer#2-modify-dag-file) for code example.) 2. Reflect tables at runtime with `defer_table_reflect` argument. 3. Set `allow_external_schedulers` to load data using [Airflow intervals](../../../general-usage/incremental-loading.md#using-airflow-schedule-for-backfill-and-incremental-loading). diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/workable.md b/docs/website/docs/dlt-ecosystem/verified-sources/workable.md index 9229ddca7e..cfb315fd7e 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/workable.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/workable.md @@ -211,7 +211,7 @@ If you wish to create your own pipelines, you can leverage source and resource m verified source. To create your data pipeline using single loading and -[incremental data loading](https://dlthub.com/docs/general-usage/incremental-loading) (only for the +[incremental data loading](/general-usage/incremental-loading) (only for the **Candidates** endpoint), follow these steps: 1. Configure the pipeline by specifying the pipeline name, destination, and dataset as follows: @@ -270,10 +270,10 @@ To create your data pipeline using single loading and 1. To use incremental loading for the candidates endpoint, maintain the same pipeline and destination dataset names. The pipeline name helps retrieve the - [state](https://dlthub.com/docs/general-usage/state) of the last run, essential for incremental + [state](/general-usage/state) of the last run, essential for incremental data loading. Changing these names might trigger a - [“dev_mode”](https://dlthub.com/docs/general-usage/pipeline#do-experiments-with-dev-mode), + [“dev_mode”](/general-usage/pipeline#do-experiments-with-dev-mode), disrupting metadata tracking for - [incremental data loading](https://dlthub.com/docs/general-usage/incremental-loading). + [incremental data loading](/general-usage/incremental-loading). diff --git a/docs/website/docs/general-usage/data-enrichments/url-parser-data-enrichment.md b/docs/website/docs/general-usage/data-enrichments/url-parser-data-enrichment.md index f2cd4a1065..c2313cf1e5 100644 --- a/docs/website/docs/general-usage/data-enrichments/url-parser-data-enrichment.md +++ b/docs/website/docs/general-usage/data-enrichments/url-parser-data-enrichment.md @@ -228,7 +228,7 @@ need to register to use this service neither get an API key. ### Run the pipeline 1. Install necessary dependencies for the preferred - [destination](https://dlthub.com/docs/dlt-ecosystem/destinations/), For example, duckdb: + [destination](/dlt-ecosystem/destinations/), For example, duckdb: ```sh pip install "dlt[duckdb]" diff --git a/docs/website/docs/general-usage/data-enrichments/user_agent_device_data_enrichment.md b/docs/website/docs/general-usage/data-enrichments/user_agent_device_data_enrichment.md index 2448d31a06..da8cc9a6bb 100644 --- a/docs/website/docs/general-usage/data-enrichments/user_agent_device_data_enrichment.md +++ b/docs/website/docs/general-usage/data-enrichments/user_agent_device_data_enrichment.md @@ -49,8 +49,8 @@ user_device_enrichment/ ``` ### 1. Creating resource - `dlt` works on the principle of [sources](https://dlthub.com/docs/general-usage/source) - and [resources.](https://dlthub.com/docs/general-usage/resource) + `dlt` works on the principle of [sources](/general-usage/source) + and [resources.](/general-usage/resource) This data resource yields data typical of what many web analytics and tracking tools can collect. However, the specifics of what data is collected @@ -281,7 +281,7 @@ The first step is to register on [SerpAPI](https://serpapi.com/) and obtain the ### Run the pipeline 1. Install necessary dependencies for the preferred - [destination](https://dlthub.com/docs/dlt-ecosystem/destinations/), For example, duckdb: + [destination](/dlt-ecosystem/destinations/), For example, duckdb: ```sh pip install "dlt[duckdb]" diff --git a/docs/website/docs/general-usage/destination-tables.md b/docs/website/docs/general-usage/destination-tables.md index 405fd4379d..c10e61ec71 100644 --- a/docs/website/docs/general-usage/destination-tables.md +++ b/docs/website/docs/general-usage/destination-tables.md @@ -225,7 +225,7 @@ problems. ## Staging dataset So far we've been using the `append` write disposition in our example pipeline. This means that -each time we run the pipeline, the data is appended to the existing tables. When you use the [merge write disposition](incremental-loading.md), dlt creates a staging database schema for staging data. This schema is named `_staging` [by default](https://dlthub.com/docs/devel/dlt-ecosystem/staging#staging-dataset) and contains the same tables as the destination schema. When you run the pipeline, the data from the staging tables is loaded into the destination tables in a single atomic transaction. +each time we run the pipeline, the data is appended to the existing tables. When you use the [merge write disposition](incremental-loading.md), dlt creates a staging database schema for staging data. This schema is named `_staging` [by default](/devel/dlt-ecosystem/staging#staging-dataset) and contains the same tables as the destination schema. When you run the pipeline, the data from the staging tables is loaded into the destination tables in a single atomic transaction. Let's illustrate this with an example. We change our pipeline to use the `merge` write disposition: diff --git a/docs/website/docs/general-usage/schema-evolution.md b/docs/website/docs/general-usage/schema-evolution.md index b2b81cfdca..dd6ec31050 100644 --- a/docs/website/docs/general-usage/schema-evolution.md +++ b/docs/website/docs/general-usage/schema-evolution.md @@ -10,7 +10,7 @@ Schema evolution is a best practice when ingesting most data. It’s simply a wa It separates the technical challenge of “loading” data, from the business challenge of “curating” data. This enables us to have pipelines that are maintainable by different individuals at different stages. -However, for cases where schema evolution might be triggered by malicious events, such as in web tracking, data contracts are advised. Read more about how to implement data contracts [here](https://dlthub.com/docs/general-usage/schema-contracts). +However, for cases where schema evolution might be triggered by malicious events, such as in web tracking, data contracts are advised. Read more about how to implement data contracts [here](/general-usage/schema-contracts). ## Schema evolution with `dlt` @@ -49,7 +49,7 @@ The schema of data above is loaded to the destination as follows: As you can see above the `dlt's` inference engine generates the structure of the data based on the source and provided hints. It normalizes the data, creates tables and columns, and infers data types. -For more information, you can refer to the **[Schema](https://dlthub.com/docs/general-usage/schema)** and **[Adjust a Schema](https://dlthub.com/docs/walkthroughs/adjust-a-schema)** sections in the documentation. +For more information, you can refer to the **[Schema](/general-usage/schema)** and **[Adjust a Schema](/walkthroughs/adjust-a-schema)** sections in the documentation. ## Evolving the schema @@ -106,7 +106,7 @@ By separating the technical process of loading data from curation, you free the **Tracking column lineage** -The column lineage can be tracked by loading the 'load_info' to the destination. The 'load_info' contains information about columns ‘data types’, ‘add times’, and ‘load id’. To read more please see [the data lineage article](https://dlthub.com/docs/blog/dlt-data-lineage) we have on the blog. +The column lineage can be tracked by loading the 'load_info' to the destination. The 'load_info' contains information about columns ‘data types’, ‘add times’, and ‘load id’. To read more please see [the data lineage article](/blog/dlt-data-lineage) we have on the blog. **Getting notifications** @@ -139,7 +139,7 @@ This script sends Slack notifications for schema updates using the `send_slack_m ## How to control evolution -`dlt` allows schema evolution control via its schema and data contracts. Refer to our **[documentation](https://dlthub.com/docs/general-usage/schema-contracts)** for details. +`dlt` allows schema evolution control via its schema and data contracts. Refer to our **[documentation](/general-usage/schema-contracts)** for details. ### How to test for removed columns - applying “not null” constraint @@ -212,4 +212,4 @@ These is a simple examples of how schema evolution works. Demonstrating schema evolution without talking about schema and data contracts is only one side of the coin. Schema and data contracts dictate the terms of how the schema being written to destination should evolve. -Schema and data contracts can be applied to entities ‘tables’ , ‘columns’ and ‘data_types’ using contract modes ‘evolve’, freeze’, ‘discard_rows’ and ‘discard_columns’ to tell `dlt` how to apply contract for a particular entity. To read more about **schema and data contracts** read our [documentation](https://dlthub.com/docs/general-usage/schema-contracts). \ No newline at end of file +Schema and data contracts can be applied to entities ‘tables’ , ‘columns’ and ‘data_types’ using contract modes ‘evolve’, freeze’, ‘discard_rows’ and ‘discard_columns’ to tell `dlt` how to apply contract for a particular entity. To read more about **schema and data contracts** read our [documentation](/general-usage/schema-contracts). \ No newline at end of file diff --git a/docs/website/docs/reference/telemetry.md b/docs/website/docs/reference/telemetry.md index ea5140bc96..d1a2215c97 100644 --- a/docs/website/docs/reference/telemetry.md +++ b/docs/website/docs/reference/telemetry.md @@ -135,7 +135,7 @@ The message context contains the following information: ## Send telemetry data to your own tracker You can setup your own tracker to receive telemetry events. You can create scalable, globally distributed -edge service [using `dlt` and Cloudflare](https://dlthub.com/docs/blog/dlt-segment-migration). +edge service [using `dlt` and Cloudflare](/blog/dlt-segment-migration). Once your tracker is running, point `dlt` to it. You can use global `config.toml` to redirect all pipelines on a given machine. diff --git a/docs/website/docs/tutorial/sql-database.md b/docs/website/docs/tutorial/sql-database.md index 1a7702b637..6a8a6b50e4 100644 --- a/docs/website/docs/tutorial/sql-database.md +++ b/docs/website/docs/tutorial/sql-database.md @@ -114,7 +114,7 @@ Alternatively, you can also paste the credentials as a connection string: sources.sql_database.credentials="mysql+pymysql://rfamro@mysql-rfam-public.ebi.ac.uk:4497/Rfam" ``` -For more details on the credentials format and other connection methods read the section on [configuring connection to the SQL Database](https://dlthub.com/docs/dlt-ecosystem/verified-sources/sql_database#credentials-format). +For more details on the credentials format and other connection methods read the section on [configuring connection to the SQL Database](/dlt-ecosystem/verified-sources/sql_database#credentials-format). ## 4. Install dependencies diff --git a/docs/website/docs/walkthroughs/create-a-pipeline.md b/docs/website/docs/walkthroughs/create-a-pipeline.md index d463921319..30a15a6478 100644 --- a/docs/website/docs/walkthroughs/create-a-pipeline.md +++ b/docs/website/docs/walkthroughs/create-a-pipeline.md @@ -9,7 +9,7 @@ keywords: [how to, create a pipeline, rest client] This guide walks you through creating a pipeline that uses our [REST API Client](../general-usage/http/rest-client) to connect to [DuckDB](../dlt-ecosystem/destinations/duckdb). :::tip -We're using DuckDB as a destination here, but you can adapt the steps to any [source](https://dlthub.com/docs/dlt-ecosystem/verified-sources/) and [destination](https://dlthub.com/docs/dlt-ecosystem/destinations/) by +We're using DuckDB as a destination here, but you can adapt the steps to any [source](/dlt-ecosystem/verified-sources/) and [destination](/dlt-ecosystem/destinations/) by using the [command](../reference/command-line-interface#dlt-init) `dlt init ` and tweaking the pipeline accordingly. ::: diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-prefect.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-prefect.md index f0cc29da87..c13efd37a0 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-prefect.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-prefect.md @@ -31,7 +31,7 @@ Here's a concise guide to orchestrating a `dlt` pipeline with Prefect using "Mov ### Here’s a summary of the steps followed: -1. Create a `dlt` pipeline. For detailed instructions on creating a pipeline, please refer to the [documentation](https://dlthub.com/docs/walkthroughs/create-a-pipeline). +1. Create a `dlt` pipeline. For detailed instructions on creating a pipeline, please refer to the [documentation](/walkthroughs/create-a-pipeline). 1. Add `@task` decorator to the individual functions. 1. Here we use `@task` decorator for `get_users` function: diff --git a/docs/website/docs/walkthroughs/zendesk-weaviate.md b/docs/website/docs/walkthroughs/zendesk-weaviate.md index cc88e59433..d4be8435a0 100644 --- a/docs/website/docs/walkthroughs/zendesk-weaviate.md +++ b/docs/website/docs/walkthroughs/zendesk-weaviate.md @@ -14,7 +14,7 @@ For our example we will use "subject" and "description" fields from a ticket as ## Prerequisites -We're going to use some ready-made components from the [dlt ecosystem](https://dlthub.com/docs/dlt-ecosystem) to make this process easier: +We're going to use some ready-made components from the [dlt ecosystem](/dlt-ecosystem) to make this process easier: 1. A [Zendesk verified source](../dlt-ecosystem/verified-sources/zendesk.md) to extract the tickets from the API. 2. A [Weaviate destination](../dlt-ecosystem/destinations/weaviate.md) to load the data into a Weaviate instance.