-
Notifications
You must be signed in to change notification settings - Fork 13
Connector Configuration
You will need to add a Kinesis catalog to Presto by adding this line to ${PRESTO_HOME}/etc/config.properties
.
datasources=jmx,hive,kinesis
Next, to configure the Kinesis Connector, create a catalog properties file
etc/catalog/kinesis.properties
with the following contents,
replacing the properties as appropriate.
connector.name=kinesis
kinesis.table-names=table1,table2
kinesis.access-key=<amazon-access-key> (optional)
kinesis.secret-key=<amazon-secret-key> (optional)
The access and secret keys are optional. If they are not provided, the default credentials chain will be used to access the kinesis streams. The following configuration properties are available :
Property Name | Description |
---|---|
kinesis.table-names |
List of all the tables provided by the catalog |
kinesis.default-schema |
Default schema name for tables |
kinesis.table-description-dir |
Directory containing table description files |
kinesis.access-key |
Access key to aws account |
kinesis.secret-key |
Secret key to aws account |
kinesis.hide-internal-columns |
Controls whether internal columns are part of the table schema or not |
kinesis.aws-region |
Aws region to be used to read kinesis stream from |
kinesis.batch-size |
Maximum number of records to return. Maximum Limit 10000 |
kinesis.fetch-attempts |
Attempts to be made to fetch data from kinesis streams |
kinesis.sleep-time |
Time till which thread sleep waiting to make next attempt to fetch data |
Comma-separated list of all tables provided by this catalog. A table name
can be unqualified (simple name) and will be put into the default schema
(see below) or qualified with a scheme name (<schema-name>.<table-name>
).
For each table defined here, a table description file (see below) may exist. If no table description file exists, the table name is used as the stream name on Kinesis and no data columns are mapped into the table. The table will contain all internal columns (see below).
This property is required; there is no defualt and at least one table must be defined.
Defines the schema which will contain all tables that were defined without a qualifying schema name.
This property is optional; the default is default
.
References a folder within Presto deployment that holds one or more JSON
files (must end wiht .json
) which contain table description files.
This property is optional; the default is etc/kinesis
.
Defines the access key ID for AWS root account or IAM roles, which is used to sign programmatic requests to AWS Kinesis.
This property is optional; if not defined, connector will try to follow Defualt- Credential-Provider-Chain provided by aws in the following order -
- Environment Variable: Load credentials from environment variables
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY.
- Java System Variable: Load from java system as
aws.accessKeyId
andaws.secretKey
- Profile Credentials File: Load from file typically located at ~/.aws/credentials
- Instance profile credentials: These credentials can be used on EC2 instances, and are delivered through the Amazon EC2 metadata service.
Defines the secret key for AWS root account or IAM roles, which together with Access Key ID, is used to sign programmatic requests to AWS Kinesis.
This property is optional; if not defined, connector will try to follow Defualt- Credential-Provider-Chain same as above.
Defines AWS Kinesis regional endpoint. Selecting appropriate region may reduce latency in fetching data.
This field is optional; The default region is us-east-1
referring to
end point 'kinesis.us-east-1.amazonaws.com'.
For each Amazon Kinesis account, following availabe regions can be used:
Region name | Region | Endpoint |
---|---|---|
us-east-1 |
US East (N. Virginia) | kinesis.us-east-1.amazonaws.com |
us-west-1 |
US West (N. California) | kinesis.us-west-1.amazonaws.com |
us-west-2 |
US West (Oregon) | kinesis.us-west-2.amazonaws.com |
eu-west-1 |
EU (Ireland) | kinesis.eu-west-1.amazonaws.com |
eu-central-1 |
EU (Frankfurt) | kinesis.eu-central-1.amazonaws.com |
ap-southeast-1 |
Asia Pacific (Singapore) | kinesis.ap-southeast-1.amazonaws.com |
ap-southeast-2 |
Asia Pacific (Sydney) | kinesis.ap-southeast-2.amazonaws.com |
ap-northeast-1 |
Asia Pacific (Tokyo) | kinesis.ap-northeast-1.amazonaws.com |
Defines maximum number of records to return in one request to Kinesis Streams. Maximum Limit is
10000 records. If a value greater than 10000 is specified, will throw InvalidArgumentException
.
This field is optional; the default value is 10000
.
Defines number of failed attempts to be made to fetch data from Kinesis Streams. If first attempt returns non empty records, then no further attempts are made.
It has been found that sometimes
GetRecordResult
returns empty records, when shard is not empty. That is why multiple attempts need to be made.
This field is optional; the default value is 3
.
Defines the milliseconds for which thread needs to sleep between get-record-attempts made to fetch data. The quantity should be followed by 'ms' string.
This field is optional; the default value is 1000ms
.
In addition to the data columns defined in a table description file, the
connector maintains a number of additional columns for each table. If
these columns are hidden, they can still be used in queries but do not
show up in DESCRIBE <table-name>
or SELECT *
.
This property is optional; the default is true
.
For each defined table, the connector maintains the following columns:
Column name | Type | Description |
---|---|---|
_shard_id |
VARCHAR | ID of the Kinesis stream shard which contains this row |
_shard_sequence_id |
VARCHAR | Sequence id within the Kinesis shard for this row |
_segment_start |
VARCHAR | Lowest sequence id in the segment (inclusive) which contains this row. This sequence id is shrard specific |
_segment_end |
VARCHAR | Highest sequence id in the segment (exclusive) which contains this row. The sequence id is shard specific. If stream is open, then this is not defined |
_segment_count |
BIGINT | Running count of for the current row within the segment |
_message_valid |
BOOLEAN | True if the decoder could decode the message successfully for this row. When false, data columns mapped f |
rom the message should be treated as invalid | ||
_message |
VARCHAR | Message bytes as an UTF-8 encoded string |
_message_length |
BIGINT | Number of bytes in the message |
_partition_key |
VARCHAR | Partition Key bytes as an UTF-8 encoded string |
For tables without a table definition file, the _message_valid
column will
always be true
.