This connector extracts metadata for user generated from DataHub.
- Install datahub.
pip install acryl-datahub
- Start quickstart datahub instance with the provided docker compose file:
datahub docker quickstart -f metaphor/datahub/docker-compose-without-neo4j-m1.quickstart.yml
- For other architectures, pull https://github.com/datahub-project/datahub/blob/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml, and add `METADATA_SERVICE_AUTH_ENABLED=true` to `datahub-gms` and `datahub-frontend-react` containers' enviroment variables.
- Once datahub starts, create a personal access token. See the official documentation for detailed process. This is the token our connector will use to connect to the datahub apis.
Create a YAML config file based on the following template.
host: <host>
port: <port>
token: <token> # This is the personal access token.
See Output Config for more information.
If there are data sources from Snowflake, MSSQL or Synapse, please provide their accounts as follows,
snowflake_account: <snowflake_account_name>
mssql_account: <mssql_account_name>
synapse_account: <synapse_account_name>
DataHub does not keep track of the description authors. You can specify the description author email in the configuration file:
description_author_email: <email>
If not provided, each dataset's first owner will be considered as the author. If no owner exists for a dataset, the placeholder email [email protected]
will be used.
Follow the Installation instructions to install metaphor-connectors
in your environment (or virtualenv). Make sure to include either all
or datahub
extra.
Run the following command to test the connector locally:
metaphor datahub <config_file>
Manually verify the output after the run finishes.