Skip to content

Latest commit

 

History

History
102 lines (63 loc) · 3.34 KB

README.md

File metadata and controls

102 lines (63 loc) · 3.34 KB

Redshift Connector

This connector extracts technical metadata from a Redshift database using asyncpg library.

Setup

The connector extracts the metadata from system catalogs, with restricted access to system tables and additional SELECT privilege to pg_catalog.svv_table_info.

Use the following command to grant the permission:

# Create a new user called "metaphor"
CREATE USER metaphor PASSWORD <password>;

# Grant minimally required privileges to the user
GRANT SELECT ON pg_catalog.svv_table_info TO metaphor;

# Grant access to syslog "SYS_*"
ALTER USER metaphor WITH SYSLOG ACCESS UNRESTRICTED;

# Grant access to "PG_USER_INFO"
GRANT SELECT ON pg_catalog.pg_user_info TO metaphor;

Note: If the Redshift cluster contains more than one database, you must grant the permission in all databases. Alternatively, you can limit the connector to a subset of databases using the filter configuration.

Redshift Spectrum

To extract external tables' metadata from SVV_EXTERNAL_*, you must grant non-admin users USAGE privilege to the corresponding external schemas (see this page for more details):

GRANT USAGE ON SCHEMA <external_schema> TO metaphor;

Config File

Create a YAML config file based on the following template.

Required Configurations

If using user password authentication:

host: <database_hostname>
user: <username>
password: <password>
database: <default_database_for_connections>

Optional Configurations

Output Destination

See Output Config for more information.

Port

By default, the connector will connect using the default Redshift port 5439. You can change it using the following config:

port: <port_number>

Filtering

See Filter Config for more information on the optional filter config.

Tag Assignment

See Tag Matcher Config for more information on the optional tag_matcher config.

Query Logs

By default, the Redshift connector will fetch a full day's query logs from yesterday, to be analyzed for additional metadata, such as dataset usage and lineage information. To backfill log data, one can set lookback_days to the desired value. To turn off query log fetching, set lookback_days to 0.

query_log:
  # (Optional) Number of days of query logs to fetch. Default to 1. If 0, the no query logs will be fetched.
  lookback_days: <days>
    
  # (Optional) A list of users whose queries will be excluded from the log fetching.
  excluded_usernames:
    - <user_name1>
    - <user_name2>
Process Query Config

See Process Query for more information on the optional process_query_config config.

Testing

Follow the Installation instructions to install metaphor-connectors in your environment (or virtualenv). Make sure to include either all or redshift extra.

Run the following command to test the connector locally:

metaphor redshift <config_file>

Manually verify the output after the run finishes.