Skip to content

Commit

Permalink
search and replace absolute links
Browse files Browse the repository at this point in the history
  • Loading branch information
sh-rp committed Sep 17, 2024
1 parent fcc4c45 commit 324a390
Show file tree
Hide file tree
Showing 25 changed files with 49 additions and 49 deletions.
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/bigquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ When staging is enabled:

## Supported Column Hints

BigQuery supports the following [column hints](https://dlthub.com/docs/general-usage/schema#tables-and-columns):
BigQuery supports the following [column hints](/general-usage/schema#tables-and-columns):

* `partition` - creates a partition with a day granularity on the decorated column (`PARTITION BY DATE`).
May be used with `datetime`, `date`, and `bigint` data types.
Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/clickhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ To enable this, GCS provides an S3
compatibility mode that emulates the S3 API, allowing ClickHouse to access GCS buckets via its S3 integration.

For detailed instructions on setting up S3-compatible storage with dlt, including AWS S3, MinIO, and Cloudflare R2, refer to
the [dlt documentation on filesystem destinations](https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#using-s3-compatible-storage).
the [dlt documentation on filesystem destinations](/dlt-ecosystem/destinations/filesystem#using-s3-compatible-storage).

To set up GCS staging with HMAC authentication in dlt:

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,7 @@ disable_compression=true

- To decompress a `gzip` file, you can use tools like `gunzip`. This will convert the compressed file back to its original format, making it readable.

For more details on managing file compression, please visit our documentation on performance optimization: [Disabling and Enabling File Compression](https://dlthub.com/docs/reference/performance#disabling-and-enabling-file-compression).
For more details on managing file compression, please visit our documentation on performance optimization: [Disabling and Enabling File Compression](/reference/performance#disabling-and-enabling-file-compression).

## Files layout
All the files are stored in a single folder with the name of the dataset that you passed to the `run` or `load` methods of the `pipeline`. In our example chess pipeline, it is **chess_players_games_data**.
Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ Which will read, `|` delimited file, without header and will continue on errors.
Note that we ignore missing columns `ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE` and we will insert NULL into them.

## Supported column hints
Snowflake supports the following [column hints](https://dlthub.com/docs/general-usage/schema#tables-and-columns):
Snowflake supports the following [column hints](/general-usage/schema#tables-and-columns):
* `cluster` - creates a cluster column(s). Many columns per table are supported and only when a new table is created.

## Table and column identifiers
Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/synapse.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ Possible values:
## Supported column hints

Synapse supports the following [column hints](https://dlthub.com/docs/general-usage/schema#tables-and-columns):
Synapse supports the following [column hints](/general-usage/schema#tables-and-columns):

* `primary_key` - creates a `PRIMARY KEY NONCLUSTERED NOT ENFORCED` constraint on the column
* `unique` - creates a `UNIQUE NOT ENFORCED` constraint on the column
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,4 @@ DBT_CLOUD__ACCOUNT_ID
DBT_CLOUD__JOB_ID
```

For more information, read the [Credentials](https://dlthub.com/docs/general-usage/credentials) documentation.
For more information, read the [Credentials](/general-usage/credentials) documentation.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ streams of data in real time.

Our AWS Kinesis [verified source](https://github.com/dlt-hub/verified-sources/tree/master/sources/kinesis)
loads messages from Kinesis streams to your preferred
[destination](https://dlthub.com/docs/dlt-ecosystem/destinations/).
[destination](/dlt-ecosystem/destinations/).

Resources that can be loaded using this verified source are:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ verified source.
```

> Loads all the data till date in the first run, and then
> [incrementally](https://dlthub.com/docs/general-usage/incremental-loading) in subsequent runs.
> [incrementally](/general-usage/incremental-loading) in subsequent runs.
1. To load data from a specific start date:

Expand All @@ -340,7 +340,7 @@ verified source.
```

> Loads data starting from the specified date during the first run, and then
> [incrementally](https://dlthub.com/docs/general-usage/incremental-loading) in subsequent runs.
> [incrementally](/general-usage/incremental-loading) in subsequent runs.
<!--@@@DLT_TUBA google_analytics-->

Original file line number Diff line number Diff line change
Expand Up @@ -441,11 +441,11 @@ dlt.resource(
`name`: Denotes the table name, set here as "spreadsheet_info".

`write_disposition`: Dictates how data is loaded to the destination.
[Read more](https://dlthub.com/docs/general-usage/incremental-loading#the-3-write-dispositions).
[Read more](/general-usage/incremental-loading#the-3-write-dispositions).

`merge_key`: Parameter is used to specify the column used to identify records for merging. In this
case,"spreadsheet_id", means that the records will be merged based on the values in this column.
[Read more](https://dlthub.com/docs/general-usage/incremental-loading#merge-incremental_loading).
[Read more](/general-usage/incremental-loading#merge-incremental_loading).

## Customization
### Create your own pipeline
Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/jira.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ above.

1. Configure the pipeline by specifying the pipeline name, destination, and dataset. To read more
about pipeline configuration, please refer to our documentation
[here](https://dlthub.com/docs/general-usage/pipeline):
[here](/general-usage/pipeline):

```py
pipeline = dlt.pipeline(
Expand Down
6 changes: 3 additions & 3 deletions docs/website/docs/dlt-ecosystem/verified-sources/matomo.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def matomo_reports(

`site_id`: Website's Site ID as per Matomo account.

>Note: This is an [incremental](https://dlthub.com/docs/general-usage/incremental-loading) source method and loads the "last_date" from the state of last pipeline run.
>Note: This is an [incremental](/general-usage/incremental-loading) source method and loads the "last_date" from the state of last pipeline run.
### Source `matomo_visits`:

Expand Down Expand Up @@ -183,7 +183,7 @@ def matomo_visits(

`get_live_event_visitors`: Retrieve unique visitor data, defaulting to False.

>Note: This is an [incremental](https://dlthub.com/docs/general-usage/incremental-loading) source method and loads the "last_date" from the state of last pipeline run.
>Note: This is an [incremental](/general-usage/incremental-loading) source method and loads the "last_date" from the state of last pipeline run.
### Resource `get_last_visits`

Expand Down Expand Up @@ -214,7 +214,7 @@ def get_last_visits(

`rows_per_page`: Number of rows on each page.

>Note: This is an [incremental](https://dlthub.com/docs/general-usage/incremental-loading) resource method and loads the "last_date" from the state of last pipeline run.
>Note: This is an [incremental](/general-usage/incremental-loading) resource method and loads the "last_date" from the state of last pipeline run.

### Transformer `visitors`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,10 @@ To get started with your data pipeline, follow these steps:
dlt init pg_replication duckdb
```

It will initialize [the pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/pg_replication_pipeline.py) with a Postgres replication as the [source](https://dlthub.com/docs/general-usage/source) and [DuckDB](https://dlthub.com/docs/dlt-ecosystem/destinations/duckdb) as the [destination](https://dlthub.com/docs/dlt-ecosystem/destinations).
It will initialize [the pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/pg_replication_pipeline.py) with a Postgres replication as the [source](/general-usage/source) and [DuckDB](/dlt-ecosystem/destinations/duckdb) as the [destination](/dlt-ecosystem/destinations).


2. If you'd like to use a different destination, simply replace `duckdb` with the name of your preferred [destination](https://dlthub.com/docs/dlt-ecosystem/destinations).
2. If you'd like to use a different destination, simply replace `duckdb` with the name of your preferred [destination](/dlt-ecosystem/destinations).
3. This source uses `sql_database` source, you can init it as follows:
Expand All @@ -81,7 +81,7 @@ To get started with your data pipeline, follow these steps:
4. After running these two commands, a new directory will be created with the necessary files and configuration settings to get started.
For more information, read the guide on [how to add a verified source](https://dlthub.com/docs/walkthroughs/add-a-verified-source).
For more information, read the guide on [how to add a verified source](/walkthroughs/add-a-verified-source).
:::note
You can omit the `[sql.sources.credentials]` section in `secrets.toml` as it is not required.
Expand Down Expand Up @@ -109,9 +109,9 @@ To get started with your data pipeline, follow these steps:
sources.pg_replication.credentials="postgresql://[email protected]:port/database"
```

3. Finally, follow the instructions in [Destinations](https://dlthub.com/docs/dlt-ecosystem/destinations/) to add credentials for your chosen destination. This will ensure that your data is properly routed.
3. Finally, follow the instructions in [Destinations](/dlt-ecosystem/destinations/) to add credentials for your chosen destination. This will ensure that your data is properly routed.

For more information, read the [Configuration section.](https://dlthub.com/docs/general-usage/credentials)
For more information, read the [Configuration section.](/general-usage/credentials)

## Run the pipeline

Expand All @@ -130,12 +130,12 @@ For more information, read the [Configuration section.](https://dlthub.com/docs/
For example, the `pipeline_name` for the above pipeline example is `pg_replication_pipeline`, you may also use any custom name instead.
For more information, read the guide on [how to run a pipeline](https://dlthub.com/docs/walkthroughs/run-a-pipeline).
For more information, read the guide on [how to run a pipeline](/walkthroughs/run-a-pipeline).
## Sources and resources
`dlt` works on the principle of [sources](https://dlthub.com/docs/general-usage/source) and [resources](https://dlthub.com/docs/general-usage/resource).
`dlt` works on the principle of [sources](/general-usage/source) and [resources](/general-usage/resource).
### Resource `replication_resource`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import Header from '../_source-info-header.md';

Efficient data management often requires loading only new or updated data from your SQL databases, rather than reprocessing the entire dataset. This is where incremental loading comes into play.

Incremental loading uses a cursor column (e.g., timestamp or auto-incrementing ID) to load only data newer than a specified initial value, enhancing efficiency by reducing processing time and resource use. Read [here](https://dlthub.com/docs/walkthroughs/sql-incremental-configuration) for more details on incremental loading with `dlt`.
Incremental loading uses a cursor column (e.g., timestamp or auto-incrementing ID) to load only data newer than a specified initial value, enhancing efficiency by reducing processing time and resource use. Read [here](/walkthroughs/sql-incremental-configuration) for more details on incremental loading with `dlt`.


#### How to configure
Expand Down Expand Up @@ -46,7 +46,7 @@ certain range.
print(info)
```
Behind the scene, the loader generates a SQL query filtering rows with `last_modified` values greater than the incremental value. In the first run, this is the initial value (midnight (00:00:00) January 1, 2024).
In subsequent runs, it is the latest value of `last_modified` that `dlt` stores in [state](https://dlthub.com/docs/general-usage/state).
In subsequent runs, it is the latest value of `last_modified` that `dlt` stores in [state](/general-usage/state).

**2. Incremental loading with the source `sql_database`**
To achieve the same using the `sql_database` source, you would specify your cursor as follows:
Expand Down Expand Up @@ -163,9 +163,9 @@ The examples below show how you can set arguments in any of the `.toml` files (`
database = sql_database()
```

You'll be able to configure all the arguments this way (except adapter callback function). [Standard dlt rules apply](https://dlthub.com/docs/general-usage/credentials/configuration#configure-dlt-sources-and-resources).
You'll be able to configure all the arguments this way (except adapter callback function). [Standard dlt rules apply](/general-usage/credentials/configuration#configure-dlt-sources-and-resources).

It is also possible to set these arguments as environment variables [using the proper naming convention](https://dlthub.com/docs/general-usage/credentials/config_providers#toml-vs-environment-variables):
It is also possible to set these arguments as environment variables [using the proper naming convention](/general-usage/credentials/config_providers#toml-vs-environment-variables):
```sh
SOURCES__SQL_DATABASE__CREDENTIALS="mssql+pyodbc://loader.database.windows.net/dlt_data?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server"
SOURCES__SQL_DATABASE__BACKEND=pandas
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ There are several options for adding your connection credentials into your `dlt`

#### 1. Setting them in `secrets.toml` or as environment variables (Recommended)

You can set up credentials using [any method](https://dlthub.com/docs/devel/general-usage/credentials/setup#available-config-providers) supported by `dlt`. We recommend using `.dlt/secrets.toml` or the environment variables. See Step 2 of the [setup](./setup) for how to set credentials inside `secrets.toml`. For more information on passing credentials read [here](https://dlthub.com/docs/devel/general-usage/credentials/setup).
You can set up credentials using [any method](/devel/general-usage/credentials/setup#available-config-providers) supported by `dlt`. We recommend using `.dlt/secrets.toml` or the environment variables. See Step 2 of the [setup](./setup) for how to set credentials inside `secrets.toml`. For more information on passing credentials read [here](/devel/general-usage/credentials/setup).


#### 2. Passing them directly in the script
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ The PyArrow backend does not yield individual rows rather loads chunks of data a


Examples:
1. Pseudonymizing data to hide personally identifiable information (PII) before loading it to the destination. (See [here](https://dlthub.com/docs/general-usage/customising-pipelines/pseudonymizing_columns) for more information on pseudonymizing data with `dlt`)
1. Pseudonymizing data to hide personally identifiable information (PII) before loading it to the destination. (See [here](/general-usage/customising-pipelines/pseudonymizing_columns) for more information on pseudonymizing data with `dlt`)

```py
import hashlib
Expand Down Expand Up @@ -92,11 +92,11 @@ Examples:

## Deploying the sql_database pipeline

You can deploy the `sql_database` pipeline with any of the `dlt` deployment methods, such as [GitHub Actions](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-github-actions), [Airflow](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer), [Dagster](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster) etc. See [here](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline) for a full list of deployment methods.
You can deploy the `sql_database` pipeline with any of the `dlt` deployment methods, such as [GitHub Actions](/walkthroughs/deploy-a-pipeline/deploy-with-github-actions), [Airflow](/walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer), [Dagster](/walkthroughs/deploy-a-pipeline/deploy-with-dagster) etc. See [here](/walkthroughs/deploy-a-pipeline) for a full list of deployment methods.

### Running on Airflow
When running on Airflow:
1. Use the `dlt` [Airflow Helper](../../../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer.md#2-modify-dag-file) to create tasks from the `sql_database` source. (If you want to run table extraction in parallel, then you can do this by setting `decompose = "parallel-isolated"` when doing the source->DAG conversion. See [here](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer#2-modify-dag-file) for code example.)
1. Use the `dlt` [Airflow Helper](../../../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer.md#2-modify-dag-file) to create tasks from the `sql_database` source. (If you want to run table extraction in parallel, then you can do this by setting `decompose = "parallel-isolated"` when doing the source->DAG conversion. See [here](/walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer#2-modify-dag-file) for code example.)
2. Reflect tables at runtime with `defer_table_reflect` argument.
3. Set `allow_external_schedulers` to load data using [Airflow intervals](../../../general-usage/incremental-loading.md#using-airflow-schedule-for-backfill-and-incremental-loading).

8 changes: 4 additions & 4 deletions docs/website/docs/dlt-ecosystem/verified-sources/workable.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ If you wish to create your own pipelines, you can leverage source and resource m
verified source.

To create your data pipeline using single loading and
[incremental data loading](https://dlthub.com/docs/general-usage/incremental-loading) (only for the
[incremental data loading](/general-usage/incremental-loading) (only for the
**Candidates** endpoint), follow these steps:

1. Configure the pipeline by specifying the pipeline name, destination, and dataset as follows:
Expand Down Expand Up @@ -270,10 +270,10 @@ To create your data pipeline using single loading and
1. To use incremental loading for the candidates endpoint, maintain the same pipeline and
destination dataset names. The pipeline name helps retrieve the
[state](https://dlthub.com/docs/general-usage/state) of the last run, essential for incremental
[state](/general-usage/state) of the last run, essential for incremental
data loading. Changing these names might trigger a
[“dev_mode”](https://dlthub.com/docs/general-usage/pipeline#do-experiments-with-dev-mode),
[“dev_mode”](/general-usage/pipeline#do-experiments-with-dev-mode),
disrupting metadata tracking for
[incremental data loading](https://dlthub.com/docs/general-usage/incremental-loading).
[incremental data loading](/general-usage/incremental-loading).

<!--@@@DLT_TUBA workable-->
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ need to register to use this service neither get an API key.
### Run the pipeline

1. Install necessary dependencies for the preferred
[destination](https://dlthub.com/docs/dlt-ecosystem/destinations/), For example, duckdb:
[destination](/dlt-ecosystem/destinations/), For example, duckdb:

```sh
pip install "dlt[duckdb]"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ user_device_enrichment/
```
### 1. Creating resource

`dlt` works on the principle of [sources](https://dlthub.com/docs/general-usage/source)
and [resources.](https://dlthub.com/docs/general-usage/resource)
`dlt` works on the principle of [sources](/general-usage/source)
and [resources.](/general-usage/resource)

This data resource yields data typical of what many web analytics and
tracking tools can collect. However, the specifics of what data is collected
Expand Down Expand Up @@ -281,7 +281,7 @@ The first step is to register on [SerpAPI](https://serpapi.com/) and obtain the
### Run the pipeline

1. Install necessary dependencies for the preferred
[destination](https://dlthub.com/docs/dlt-ecosystem/destinations/), For example, duckdb:
[destination](/dlt-ecosystem/destinations/), For example, duckdb:

```sh
pip install "dlt[duckdb]"
Expand Down
Loading

0 comments on commit 324a390

Please sign in to comment.