This connector extracts technical metadata from a Redshift database using asyncpg library.
The connector extracts the metadata from system catalogs, with restricted access to system tables and additional SELECT
privilege to pg_catalog.svv_table_info
.
Use the following command to grant the permission:
# Create a new user called "metaphor"
CREATE USER metaphor PASSWORD <password>;
# Grant minimally required privileges to the user
GRANT SELECT ON pg_catalog.svv_table_info TO metaphor;
# Grant access to syslog "SYS_*"
ALTER USER metaphor WITH SYSLOG ACCESS UNRESTRICTED;
# Grant access to "PG_USER_INFO"
GRANT SELECT ON pg_catalog.pg_user_info TO metaphor;
Note: If the Redshift cluster contains more than one database, you must grant the permission in all databases. Alternatively, you can limit the connector to a subset of databases using the filter configuration.
To extract external tables' metadata from SVV_EXTERNAL_*
, you must grant non-admin users USAGE
privilege to the corresponding external schemas (see this page for more details):
GRANT USAGE ON SCHEMA <external_schema> TO metaphor;
Create a YAML config file based on the following template.
If using user password authentication:
host: <database_hostname>
user: <username>
password: <password>
database: <default_database_for_connections>
See Output Config for more information.
By default, the connector will connect using the default Redshift port 5439. You can change it using the following config:
port: <port_number>
See Filter Config for more information on the optional filter
config.
See Tag Matcher Config for more information on the optional tag_matcher
config.
By default, the Redshift connector will fetch a full day's query logs from yesterday, to be analyzed for additional metadata, such as dataset usage and lineage information. To backfill log data, one can set lookback_days
to the desired value. To turn off query log fetching, set lookback_days
to 0.
query_log:
# (Optional) Number of days of query logs to fetch. Default to 1. If 0, the no query logs will be fetched.
lookback_days: <days>
# (Optional) A list of users whose queries will be excluded from the log fetching.
excluded_usernames:
- <user_name1>
- <user_name2>
See Process Query for more information on the optional process_query_config
config.
Follow the Installation instructions to install metaphor-connectors
in your environment (or virtualenv). Make sure to include either all
or redshift
extra.
Run the following command to test the connector locally:
metaphor redshift <config_file>
Manually verify the output after the run finishes.