Skip to content

Commit

Permalink
[DOCS] 1.0 guide for securely storing and accessing credentials and t…
Browse files Browse the repository at this point in the history
…okens (#10157)
  • Loading branch information
Rachel-Reverie authored Aug 5, 2024
1 parent dc0889e commit 42ceec9
Show file tree
Hide file tree
Showing 12 changed files with 432 additions and 7 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
import GxData from '../../_core_components/_data.jsx'
import PreReqFileDataContext from '../../_core_components/prerequisites/_file_data_context.md'

### Prerequisites

- An AWS Secrets Manager instance. See [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/tutorials_basic.html).
- The ability to install Python packages with `pip`.
- <PreReqFileDataContext/>.

### Procedure

1. Set up AWS Secrets Manager support.

To use the AWS Secrets Manager with {GxData.product_name} you will first need to install the `great_expectations` Python package with the `aws_secrets` requirement. To do this, run the following command:

```bash title="Terminal"
pip install 'great_expectations[aws_secrets]'
```

2. Reference AWS Secrets Manager variables in `config_variables.yml`.

By default, `config_variables.yml` is located at: 'gx/uncomitted/config_variables.yml' in your File Data Context.

Values in `config_variables.yml` that start with `secret|arn:aws:secretsmanager` will be substituted with corresponding values from the AWS Secrets Manager. However, if the keywords following `secret|arn:aws:secretsmanager` do not correspond to keywords in AWS Secrets Manager no substitution will occur.

You can reference other stored credentials within the keywords by wrapping their corresponding variable in `${` and `}`. When multiple references are present in a value, the secrets manager substitution takes place after all other substitutions have occurred.

An entire connection string can be referenced from the secrets manager. In this example, `dev_db_credentials` is the Secret Name in AWS Secrets Manager, and `connection_string` is the Secret Key that corresponds to the value to be retrieved:

```yaml title="config_variables.yml"
my_aws_creds: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|connection_string
```
Or each component of the connection string can be referenced separately. In these examples, `dev_db_credentials` remains the Secret Name in AWS Secrets Manager. However, rather than retrieving the value of the Secret Key `connection_string`, Secret Keys for individual parts of the connection string are provided for retrieval:

```yaml title="config_variables.yml"
drivername: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|drivername
host: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|host
port: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|port
username: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|username
password: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|password
database: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|database
```

Note that the last seven characters of an AWS Secrets Manager arn are automatically generated by AWS and are not mandatory to retrieve the secret. For example, the following two values retrieve the same secret:

```yaml title="config_variables.yml"
secret1: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:my_secret-1zAyu6
secret2: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:my_secret
```

3. Optional. Reference versioned secrets.

Unless otherwise specified, the latest version of the secret is returned by default. To get a specific version of the secret you want to retrieve, specify its version UUID. For example:

```yaml title="config_variables.yml"
versioned_secret: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:my_secret:00000000-0000-0000-0000-000000000000
```

4. Optional. Retrieve specific secrets from a JSON string.

To retrieve a specific secret from a JSON string, include the JSON key after a pipe character `|` at the end of the secrets keywords. For example:

```yaml title="config_variables.yml"
json_secret: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:my_secret|<KEY>
versioned_json_secret: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:my_secret:00000000-0000-0000-0000-000000000000|<KEY>
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
import GxData from '../../_core_components/_data.jsx'
import PreReqFileDataContext from '../../_core_components/prerequisites/_file_data_context.md'

### Prerequisites

- An [Azure Key Vault instance with configured secrets](https://docs.microsoft.com/en-us/azure/key-vault/general/overview).
- The ability to install Python packages with `pip`.
- <PreReqFileDataContext/>.

### Procedure

1. Set up Azure Key Vault support.

To use Azure Key Vault with {GxData.product_name} you will first need to install the `great_expectations` Python package with the `azure_secrets` requirement. To do this, run the following command:

```bash title="Terminal"
pip install 'great_expectations[azure_secrets]'
```

2. Reference Azure Key Vault variables in `config_variables.yml`.

By default, `config_variables.yml` is located at: 'gx/uncomitted/config_variables.yml' in your File Data Context.

Values in `config_variables.yml` that match the regex `^secret\|https:\/\/[a-zA-Z0-9\-]{3,24}\.vault\.azure\.net` will be substituted with corresponding values from Azure Key Vault. However, if the keywords in the matching regex do not correspond to keywords in Azure Key Vault no substitution will occur.

You can reference other stored credentials within the regex by wrapping their corresponding variable in `${` and `}`. When multiple references are present in a value, the secrets manager substitution takes place after all other substitutions have occurred.

An entire connection string can be referenced from the secrets manager:

```yaml title="config_variables.yml"
my_abs_creds: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|connection_string
```
Or each component of the connection string can be referenced separately:
```yaml title="config_variables.yml"
drivername: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|host
host: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|host
port: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|port
username: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|username
password: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|password
database: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|database
```
3. Optional. Reference versioned secrets.
Unless otherwise specified, the latest version of the secret is returned by default. To get a specific version of the secret you want to retrieve, specify its version id (32 alphanumeric characters). For example:
```yaml title="config_variables.yml"
versioned_secret: secret|https://${VAULT_NAME}.vault.azure.net/secrets/my-secret/a0b00aba001aaab10b111001100a11ab
```
4. Optional. Retrieve specific secrets for a JSON string.
To retrieve a specific secret for a JSON string, include the JSON key after a pipe character `|` at the end of the secrets regex. For example:

```yaml title="config_variables.yml"
json_secret: secret|https://${VAULT_NAME}.vault.azure.net/secrets/my-secret|<KEY>
versioned_json_secret: secret|https://${VAULT_NAME}.vault.azure.net/secrets/my-secret/a0b00aba001aaab10b111001100a11ab|<KEY>
```


Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
import GxData from '../../_core_components/_data.jsx'
import PreReqFileDataContext from '../../_core_components/prerequisites/_file_data_context.md'

### Prerequisites

- A [GCP Secret Manager instance with configured secrets](https://cloud.google.com/secret-manager/docs/quickstart).
- The ability to install Python packages with `pip`.
- <PreReqFileDataContext/>.

### Procedure

1. Set up Azure Key Vault support.

To use Azure Key Vault with {GxData.product_name} you will first need to install the `great_expectations` Python package with the `gcp` requirement. To do this, run the following command:

```bash title="Terminal"
pip install 'great_expectations[gcp]'
```

2. Reference GCP Secret Manager variables in `config_variables.yml`.

By default, `config_variables.yml` is located at: 'gx/uncomitted/config_variables.yml' in your File Data Context.

Values in `config_variables.yml` that match the regex `^secret\|projects\/[a-z0-9\_\-]{6,30}\/secrets` will be substituted with corresponding values from GCP Secret Manager. However, if the keywords in the matching regex do not correspond to keywords in GCP Secret Manager no substitution will occur.

You can reference other stored credentials within the regex by wrapping their corresponding variable in `${` and `}`. When multiple references are present in a value, the secrets manager substitution takes place after all other substitutions have occurred.

An entire connection string can be referenced from the secrets manager:

```yaml title="config_variables.yml"
my_gcp_creds: secret|projects/${PROJECT_ID}/secrets/dev_db_credentials|connection_string
```
Or each component of the connection string can be referenced separately:
```yaml title="config_variables.yml"
drivername: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_DRIVERNAME
host: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_HOST
port: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_PORT
username: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_USERNAME
password: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_PASSWORD
database: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_DATABASE
```
3. Optional. Reference versioned secrets.
Unless otherwise specified, the latest version of the secret is returned by default. To get a specific version of the secret you want to retrieve, specify its version id. For example:
```yaml title="config_variables.yml"
versioned_secret: secret|projects/${PROJECT_ID}/secrets/my_secret/versions/1
```
4. Optional. Retrieve specific secrets for a JSON string.
To retrieve a specific secret for a JSON string, include the JSON key after a pipe character `|` at the end of the secrets regex. For example:

```yaml title="config_variables.yml"
json_secret: secret|projects/${PROJECT_ID}/secrets/my_secret|<KEY>
versioned_json_secret: secret|projects/${PROJECT_ID}/secrets/my_secret/versions/1|<KEY>
```






Configure your Great Expectations project to substitute variables from the Google Cloud Secret Manager. Secrets store substitution uses the configurations from your ``config_variables.yml`` file after all other types of substitution are applied with environment variables.

Secrets store substitution uses keywords and retrieves secrets from the secrets store for values matching the following regex ``^secret\|projects\/[a-z0-9\_\-]{6,30}\/secrets``. If the values you provide don't match the keywords, the values aren't substituted.

1. Run the following code to install the ``great_expectations`` package with the ``gcp`` requirement:

```bash
pip install 'great_expectations[gcp]'
```

2. Provide the name of the secret you want to substitute in GCP Secret Manager. For example, ``secret|projects/project_id/secrets/my_secret``.

The latest version of the secret is returned by default.

3. Optional. To get a specific version of the secret, specify its version id. For example, ``secret|projects/project_id/secrets/my_secret/versions/1``.

4. Optional. To retrieve a specific secret value for a JSON string, use ``secret|projects/project_id/secrets/my_secret|key`` or ``secret|projects/project_id/secrets/my_secret/versions/1|key``.

5. Save your access credentials or the database connection string to ``great_expectations/uncommitted/config_variables.yml``. For example:

```yaml
# We can configure a single connection string
my_gcp_creds: secret|projects/${PROJECT_ID}/secrets/dev_db_credentials|connection_string
# Or each component of the connection string separately
drivername: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_DRIVERNAME
host: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_HOST
port: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_PORT
username: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_USERNAME
password: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_PASSWORD
database: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_DATABASE
```

6. Run the following code to use the `connection_string` parameter values when you add a `datasource` to a Data Context:

```python
# We can use a single connection string
pg_datasource = context.data_sources.add_or_update_sql(
name="my_postgres_db", connection_string="${my_gcp_creds}"
)
# Or each component of the connection string separately
pg_datasource = context.data_sources.add_or_update_sql(
name="my_postgres_db", connection_string="${drivername}://${username}:${password}@${host}:${port}/${database}"
)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import TabItem from '@theme/TabItem';
import Tabs from '@theme/Tabs';
import GxData from '../../_core_components/_data.jsx'

import AwsSecretsManager from './_aws_secrets_manager.md';
import GcpSecretManager from './_gcp_secret_manager.md';
import AzureKeyVault from './_azure_key_vault.md';

{GxData.product_name} supports the AWS Secrets Manager, Google Cloud Secret Manager, and Azure Key Vault secrets managers.

Use of a secrets manager is optional. [Credentials can be securely stored as environment variables or entries in a yaml file](#configure-credentials) without referencing content stored in a secrets manager.

<Tabs queryString="manager_type" groupId="manager_type" defaultValue='aws' values={[{label: 'AWS Secrets Manager', value:'aws'}, {label: 'GCP Secret Manager', value:'gcp'}, {label: 'Azure Key Vault', value:'azure'}]}>

<TabItem value="aws">
<AwsSecretsManager/>
</TabItem>

<TabItem value="gcp">
<GcpSecretManager/>
</TabItem>

<TabItem value="azure">
<AzureKeyVault/>
</TabItem>

</Tabs>
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
title: Access secrets managers
description: Access credentials that are stored in AWS Secrets Manager, GCP Secret Manager, or Azure key vault.
hide_feedback_survey: false
hide_title: false
---

import TabItem from '@theme/TabItem';
import Tabs from '@theme/Tabs';
import GxData from '../../_core_components/_data.jsx'

import AwsSecretsManager from './_aws_secrets_manager.md';
import GcpSecretManager from './_gcp_secret_manager.md';
import AzureKeyVault from './_azure_key_vault.md';

{GxData.product_name} supports the AWS Secrets Manager, Google Cloud Secret Manager, and Azure Key Vault secrets managers.

Use of a secrets manager is optional. [Credentials can be securely stored as environment variables or entries in a yaml file](core/configure_project_settings/configure_credentials/configure_credentials.md) without referencing content stored in a secrets manager.

<Tabs queryString="manager_type" groupId="manager_type" defaultValue='aws' values={[{label: 'AWS Secrets Manager', value:'aws'}, {label: 'GCP Secret Manager', value:'gcp'}, {label: 'Azure Key Vault', value:'azure'}]}>

<TabItem value="aws">
<AwsSecretsManager/>
</TabItem>

<TabItem value="gcp">
<GcpSecretManager/>
</TabItem>

<TabItem value="azure">
<AzureKeyVault/>
</TabItem>

</Tabs>
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import GxData from '../../_core_components/_data.jsx';

Securely stored credentials are implemented via string substitution. You can reference your credentials in a Python string by wrapping the variable name they are assigned to in `${` and `}`. Using individual credentials for a connection string would look like:

```python title="Python"
connection_string="postgresql+psycopg2://${MY_POSTGRES_USERNAME}:${MY_POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DATABASE}",
```

Or you could reference a configured variable that contains the full connection string by providing a Python string that contains just a reference to that variable:

```python title="Python"
connection_string="${POSTGRES_CONNECTION_STRING}"
```

When you pass a string that references your stored credentials to a {GxData.product_name} method that requires string formatted credentials as a parameter the referenced variable in your Python string will be substituted for the corresponding stored value.
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
YAML files make variables more visible, are easier to edit, and allow for modularization. For example, you can create a YAML file for development and testing and another for production.

A File Data Context is required before you can configure credentials in a YAML file. By default, the credentials file in a File Data Context is located at `/great_expectations/uncommitted/config_variables.yml`. The `uncommitted/` directory is included in a default `.gitignore` and will be excluded from version control.

These examples demonstrate how to save credentials in the form of a connection string for a database. However, the same process can be used for things such as web app tokens or any other credential that can be stored in string format.

Each entry in `config_variables.yml` should consist of two parts. The first is a variable which you will reference in the place of the credential. The second is the value that should be substituted for that variable when it is referenced. For example:

```bash title="config_variables.yml"
MY_POSTGRES_USERNAME: <USERNAME>
MY_POSTGRES_PASSWORD: <PASSWORD>
```

or:

```bash title="config_variables.yml"
POSTGRES_CONNECTION_STRING: postgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>
```

You can also reference your stored credentials within a stored connection string by wrapping their corresponding variable in `${` and `}`. For example:

```bash title="config_variables.yml"
MY_POSTGRES_USERNAME: <USERNAME>
MY_POSTGRES_PASSWORD: <PASSWORD>
POSTGRES_CONNECTION_STRING: postgresql+psycopg2://${MY_POSTGRES_USERNAME}:${MY_POSTGRES_PASSWORD}@<HOST>:<PORT>/<DATABASE>
```

Because the dollar sign character `$` is used to indicate the start of a string substitution they should be escaped using a backslash `\` if they are part of your credentials. For example, if your password is `pa$$word` then in the previous examples you would use the command:

```bash title="Terminal"
export MY_POSTGRES_PASSWORD=pa\$\$word
```
Loading

0 comments on commit 42ceec9

Please sign in to comment.