Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix grammar in docs: batch 3 #1076

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
df45cba
Fix grammar in docs: batch 3
burnash Mar 11, 2024
7e3dc9d
Update docs/website/docs/general-usage/credentials/configuration.md
burnash Mar 11, 2024
b1c916d
Update docs/website/docs/general-usage/resource.md
burnash Mar 11, 2024
09c3b81
Update docs/website/docs/general-usage/credentials/config_providers.md
burnash Mar 11, 2024
0c9f27a
Update docs/website/docs/general-usage/credentials/config_specs.md
burnash Mar 11, 2024
fdc687a
Update docs/website/docs/general-usage/credentials/configuration.md
burnash Mar 11, 2024
b0d90eb
Update docs/website/docs/general-usage/credentials/configuration.md
burnash Mar 11, 2024
d947b4e
Update docs/website/docs/general-usage/credentials/configuration.md
burnash Mar 11, 2024
d0b3582
Update docs/website/docs/general-usage/data-enrichments/user_agent_de…
burnash Mar 11, 2024
c8455f1
Update docs/website/docs/general-usage/data-enrichments/user_agent_de…
burnash Mar 11, 2024
8cbe83c
Update docs/website/docs/general-usage/data-enrichments/user_agent_de…
burnash Mar 11, 2024
7ccfbf8
Update docs/website/docs/general-usage/destination-tables.md
burnash Mar 11, 2024
378427d
Update docs/website/docs/general-usage/glossary.md
burnash Mar 11, 2024
e40b657
Update docs/website/docs/general-usage/pipeline.md
burnash Mar 11, 2024
39944ab
Update docs/website/docs/general-usage/pipeline.md
burnash Mar 11, 2024
e75560c
Update docs/website/docs/general-usage/pipeline.md
burnash Mar 11, 2024
ec77a3f
Update docs/website/docs/general-usage/resource.md
burnash Mar 11, 2024
d9bbfbb
Update docs/website/docs/general-usage/resource.md
burnash Mar 11, 2024
dd4096c
Update docs/website/docs/tutorial/load-data-from-an-api.md
burnash Mar 11, 2024
a93b9e9
Update docs/website/docs/tutorial/load-data-from-an-api.md
burnash Mar 11, 2024
ff606b1
Update docs/website/docs/tutorial/grouping-resources.md
burnash Mar 11, 2024
6d0925b
Update docs/website/docs/general-usage/credentials/configuration.md
burnash Mar 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 26 additions & 54 deletions docs/website/docs/general-usage/credentials/config_providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,34 +7,21 @@ keywords: [credentials, secrets.toml, secrets, config, configuration, environmen

# Configuration Providers


Configuration Providers in the context of the `dlt` library
refer to different sources from which configuration values
and secrets can be retrieved for a data pipeline.
These providers form a hierarchy, with each having its own
priority in determining the values for function arguments.
Configuration Providers, in the context of the `dlt` library, refer to different sources from which configuration values and secrets can be retrieved for a data pipeline. These providers form a hierarchy, each having its own priority in determining the values for function arguments.

## The provider hierarchy

If function signature has arguments that may be injected, `dlt` looks for the argument values in
providers.
If a function signature has arguments that may be injected, `dlt` looks for the argument values in providers.

### Providers

1. **Environment Variables**: At the top of the hierarchy are environment variables.
If a value for a specific argument is found in an environment variable,
dlt will use it and will not proceed to search in lower-priority providers.
1. **Environment Variables**: At the top of the hierarchy are environment variables. If a value for a specific argument is found in an environment variable, dlt will use it and will not proceed to search in lower-priority providers.

2. **Vaults (Airflow/Google/AWS/Azure)**: These are specialized providers that come
after environment variables. They can provide configuration values and secrets.
However, they typically focus on handling sensitive information.
2. **Vaults (Airflow/Google/AWS/Azure)**: These are specialized providers that come after environment variables. They can provide configuration values and secrets. However, they typically focus on handling sensitive information.

3. **`secrets.toml` and `config.toml` Files**: These files are used for storing both
configuration values and secrets. `secrets.toml` is dedicated to sensitive information,
while `config.toml` contains non-sensitive configuration data.
3. **`secrets.toml` and `config.toml` Files**: These files are used for storing both configuration values and secrets. `secrets.toml` is dedicated to sensitive information, while `config.toml` contains non-sensitive configuration data.

4. **Default Argument Values**: These are the values specified in the function's signature.
They have the lowest priority in the provider hierarchy.
4. **Default Argument Values**: These are the values specified in the function's signature. They have the lowest priority in the provider hierarchy.

### Example

Expand All @@ -49,34 +36,27 @@ def google_sheets(
...
```

In case of `google_sheets()` it will look
for: `spreadsheet_id`, `tab_names`, `credentials` and `only_strings`
In the case of `google_sheets()`, it will look for: `spreadsheet_id`, `tab_names`, `credentials`, and `only_strings`.

Each provider has its own key naming convention, and dlt is able to translate between them.

**The argument name is a key in the lookup**.

At the top of the hierarchy are Environment Variables, then `secrets.toml` and
`config.toml` files. Providers like Airflow/Google/AWS/Azure Vaults will be inserted **after** the Environment
provider but **before** TOML providers.
At the top of the hierarchy are Environment Variables, then `secrets.toml` and `config.toml` files. Providers like Airflow/Google/AWS/Azure Vaults will be inserted **after** the Environment provider but **before** TOML providers.

For example, if `spreadsheet_id` is found in environment variable `SPREADSHEET_ID`, `dlt` will not look in TOML files
and below.
For example, if `spreadsheet_id` is found in the environment variable `SPREADSHEET_ID`, `dlt` will not look in TOML files and below.

The values passed in the code **explicitly** are the **highest** in provider hierarchy. The **default values**
of the arguments have the **lowest** priority in the provider hierarchy.
The values passed in the code **explicitly** are the **highest** in the provider hierarchy. The **default values** of the arguments have the **lowest** priority in the provider hierarchy.

:::info
Explicit Args **>** ENV Variables **>** Vaults: Airflow etc. **>** `secrets.toml` **>** `config.toml` **>** Default Arg Values
:::

Secrets are handled only by the providers supporting them. Some providers support only
secrets (to reduce the number of requests done by `dlt` when searching sections).
Secrets are handled only by the providers supporting them. Some providers support only secrets (to reduce the number of requests done by `dlt` when searching sections).

1. `secrets.toml` and environment may hold both config and secret values.
1. `config.toml` may hold only config values, no secrets.
1. Various vaults providers hold only secrets, `dlt` skips them when looking for values that are not
secrets.
2. `config.toml` may hold only config values, no secrets.
3. Various vaults providers hold only secrets, `dlt` skips them when looking for values that are not secrets.

:::info
Context-aware providers will activate in the right environments i.e. on Airflow or AWS/GCP VMachines.
Expand All @@ -86,22 +66,19 @@ Context-aware providers will activate in the right environments i.e. on Airflow

### TOML vs. Environment Variables

Providers may use different formats for the keys. `dlt` will translate the standard format where
sections and key names are separated by "." into the provider-specific formats.
Providers may use different formats for the keys. `dlt` will translate the standard format where sections and key names are separated by "." into the provider-specific formats.

1. For TOML, names are case-sensitive and sections are separated with ".".
1. For Environment Variables, all names are capitalized and sections are separated with double
underscore "__".
2. For Environment Variables, all names are capitalized and sections are separated with double underscore "__".

Example: When `dlt` evaluates the request `dlt.secrets["my_section.gcp_credentials"]` it must find
the `private_key` for Google credentials. It will look
Example: When `dlt` evaluates the request `dlt.secrets["my_section.gcp_credentials"]`, it must find the `private_key` for Google credentials. It will look:

1. first in env variable `MY_SECTION__GCP_CREDENTIALS__PRIVATE_KEY` and if not found,
1. in `secrets.toml` with key `my_section.gcp_credentials.private_key`.
1. first in the env variable `MY_SECTION__GCP_CREDENTIALS__PRIVATE_KEY` and if not found,
2. in `secrets.toml` with the key `my_section.gcp_credentials.private_key`.

### Environment provider

Looks for the values in the environment variables.
This provider looks for the values in the environment variables.

### TOML provider

Expand All @@ -110,12 +87,10 @@ The TOML provider in dlt utilizes two TOML files:
- `secrets.toml `- This file is intended for storing sensitive information, often referred to as "secrets".
- `config.toml `- This file is used for storing configuration values.

By default, the `.gitignore` file in the project prevents `secrets.toml` from being added to
version control and pushed. However, `config.toml` can be freely added to version control.
By default, the `.gitignore` file in the project prevents `secrets.toml` from being added to version control and pushed. However, `config.toml` can be freely added to version control.

:::info
**TOML provider always loads those files from `.dlt` folder** which is looked **relative to the
current Working Directory**.
**The TOML provider always loads these files from the `.dlt` folder**, which is looked for **relative to the current Working Directory**.
burnash marked this conversation as resolved.
Show resolved Hide resolved
:::

Example: If your working directory is `my_dlt_project` and your project has the following structure:
Expand All @@ -128,14 +103,11 @@ my_dlt_project:
|---- google_sheets.py
```

and you run `python pipelines/google_sheets.py` then `dlt` will look for `secrets.toml` in
`my_dlt_project/.dlt/secrets.toml` and ignore the existing
`my_dlt_project/pipelines/.dlt/secrets.toml`.
and you run `python pipelines/google_sheets.py`, then `dlt` will look for `secrets.toml` in `my_dlt_project/.dlt/secrets.toml` and ignore the existing `my_dlt_project/pipelines/.dlt/secrets.toml`.

If you change your working directory to `pipelines` and run `python google_sheets.py` it will look for
`my_dlt_project/pipelines/.dlt/secrets.toml` as (probably) expected.
If you change your working directory to `pipelines` and run `python google_sheets.py`, it will look for `my_dlt_project/pipelines/.dlt/secrets.toml` as (probably) expected.

:::caution
It's worth mentioning that the TOML provider also has the capability to read files from `~/.dlt/`
(located in the user's home directory) in addition to the local project-specific `.dlt` folder.
:::
It's worth mentioning that the TOML provider also has the capability to read files from `~/.dlt/` (located in the user's home directory) in addition to the local project-specific `.dlt` folder.
:::

25 changes: 13 additions & 12 deletions docs/website/docs/general-usage/credentials/config_specs.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ service account credentials, while `ConnectionStringCredentials` handles databas

### Example

As an example, let's use `ConnectionStringCredentials` which represents a database connection
As an example, let's use `ConnectionStringCredentials`, which represents a database connection
string.

```python
Expand All @@ -29,7 +29,7 @@ def query(sql: str, dsn: ConnectionStringCredentials = dlt.secrets.value):
...
```

The source above executes the `sql` against database defined in `dsn`. `ConnectionStringCredentials`
The source above executes the `sql` against the database defined in `dsn`. `ConnectionStringCredentials`
makes sure you get the correct values with correct types and understands the relevant native form of
the credentials.

Expand All @@ -51,7 +51,7 @@ Example 2. Use the **native** form.
dsn="postgres://loader:loader@localhost:5432/dlt_data"
```

Example 3. Use the **mixed** form: the password is missing in explicit dsn and will be taken from the
Example 3. Use the **mixed** form: the password is missing in the explicit dsn and will be taken from the
burnash marked this conversation as resolved.
Show resolved Hide resolved
`secrets.toml`.

```toml
Expand All @@ -66,7 +66,7 @@ query("SELECT * FROM customers", "postgres://loader@localhost:5432/dlt_data")
query("SELECT * FROM customers", {"database": "dlt_data", "username": "loader"...})
```

## Built in credentials
## Built-in credentials

We have some ready-made credentials you can reuse:

Expand Down Expand Up @@ -141,7 +141,7 @@ it is a base class for [GcpOAuthCredentials](#gcpoauthcredentials).
- [GcpOAuthCredentials](#gcpoauthcredentials).

[Google Analytics verified source](https://github.com/dlt-hub/verified-sources/blob/master/sources/google_analytics/__init__.py):
the example how to use GCP Credentials.
the example of how to use GCP Credentials.

#### GcpServiceAccountCredentials

Expand All @@ -150,7 +150,7 @@ This class provides methods to retrieve native credentials for Google clients.

##### Usage

- You may just pass the `service.json` as string or dictionary (in code and via config providers).
- You may just pass the `service.json` as a string or dictionary (in code and via config providers).
- Or default credentials will be used.

```python
Expand Down Expand Up @@ -249,12 +249,12 @@ and `config.toml`:
property_id = "213025502"
```

In order for `auth()` method to succeed:
In order for the `auth()` method to succeed:

- You must provide valid `client_id` and `client_secret`,
- You must provide a valid `client_id` and `client_secret`,
`refresh_token` and `project_id` in order to get a current
**access token** and authenticate with OAuth.
Mind that the `refresh_token` must contain all the scopes that you require for your access.
Keep in mind that the `refresh_token` must contain all the scopes that you require for your access.
- If `refresh_token` is not provided, and you run the pipeline from a console or a notebook,
`dlt` will use InstalledAppFlow to run the desktop authentication flow.

Expand Down Expand Up @@ -429,15 +429,15 @@ of credentials that derive from the common class, so you can handle it seamlessl

This is used a lot in the `dlt` core and may become useful for complicated sources.

In fact, for each decorated function a spec is synthesized. In case of `google_sheets` following
In fact, for each decorated function a spec is synthesized. In the case of `google_sheets`, the following
class is created:

```python
from dlt.sources.config import configspec, with_config

@configspec
class GoogleSheetsConfiguration(BaseConfiguration):
tab_names: List[str] = None # manadatory
tab_names: List[str] = None # mandatory
credentials: GcpServiceAccountCredentials = None # mandatory secret
only_strings: Optional[bool] = False
```
Expand Down Expand Up @@ -465,4 +465,5 @@ and is meant to serve as a base class for handling various types of credentials.
It defines methods for initializing credentials, converting them to native representations,
and generating string representations while ensuring sensitive information is appropriately handled.

More information about this class can be found in the class docstrings.
More information about this class can be found in the class docstrings.

Loading
Loading