Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-2085 fix(dataloading): improve doc for Load from External Kafka (3.9?-4.1) #491

Open
wants to merge 6 commits into
base: 4.1
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 6 additions & 37 deletions modules/data-loading/examples/config-avro
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
connector.class=org.apache.kafka.connect.mirror.MirrorSourceConnector
source.cluster.alias=hello
target.cluster.alias=world
source.cluster.bootstrap.servers=source.kafka.server:9092
target.cluster.bootstrap.servers=localhost:30002
source.cluster.bootstrap.servers=<source.broker1:port,source.broker2:port,...>
target.cluster.bootstrap.servers=<local.broker1:port,local.broker2:port,...>
source->target.enabled=true
topics=avro-without-registry-topic
replication.factor=1
Expand All @@ -18,41 +18,10 @@ emit.heartbeats.interval.seconds=5
world.scheduled.rebalance.max.delay.ms=35000
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
header.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=com.tigergraph.kafka.connect.converters.TigerGraphAvroConverterWithoutSchemaRegistry

producer.security.protocol=SASL_SSL
producer.sasl.mechanism=GSSAPI
producer.sasl.kerberos.service.name=kafka
producer.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab=\"/path/to/kafka-producer.keytab\" principal=\"[email protected]\";
producer.ssl.endpoint.identification.algorithm=
producer.ssl.keystore.location=/path/to/client.keystore.jks
producer.ssl.keystore.password=******
producer.ssl.key.password=******
producer.ssl.truststore.location=/path/to/client.truststore.jks
producer.ssl.truststore.password=******

consumer.security.protocol=SASL_SSL
consumer.sasl.mechanism=GSSAPI
consumer.sasl.kerberos.service.name=kafka
consumer.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab=\"/path/to/kafka-consumer.keytab\" principal=\"[email protected]\";
consumer.ssl.endpoint.identification.algorithm=
consumer.ssl.keystore.location=/path/to/client.keystore.jks
consumer.ssl.keystore.password=******
consumer.ssl.key.password=******
consumer.ssl.truststore.location=/path/to/client.truststore.jks
consumer.ssl.truststore.password=******

source.admin.security.protocol=SASL_SSL
source.admin.sasl.mechanism=GSSAPI
source.admin.sasl.kerberos.service.name=kafka
source.admin.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab=\"/path/to/kafka-admin.keytab\" principal=\"[email protected]\";
source.admin.ssl.endpoint.identification.algorithm=
source.admin.ssl.keystore.location=/path/to/client.keystore.jks
source.admin.ssl.keystore.password=******
source.admin.ssl.key.password=******
source.admin.ssl.truststore.location=/path/to/client.truststore.jks
source.admin.ssl.truststore.password=******
transforms=TigerGraphAvroTransform
transforms.TigerGraphAvroTransform.type=com.tigergraph.kafka.connect.transformations.TigergraphAvroWithoutSchemaRegistryTransformation
transforms.TigerGraphAvroTransform.errors.tolerance=none

[connector_1]
name=avro-test-without-registry
tasks.max=10
tasks.max=10
4 changes: 2 additions & 2 deletions modules/data-loading/pages/data-loading-overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ TigerGraph uses the same workflow for both local file and Kafka Connect loading:
. *Specify a graph*.
Data is always loading to exactly one graph (though that graph could have global vertices and edges which are shared with other graphs). For example:
+
[source,php]
[source,gsql]
USE GRAPH ldbc_snb

. If you are using Kafka Connect, *define a `DATA_SOURCE` object*.
Expand All @@ -64,7 +64,7 @@ image::data-loading:loading_arch_3.9.3.png[Architectural diagram showing support
== Loading Jobs
A loading job tells the database how to construct vertices and edges from data sources.

[source,php]
[source,gsql]
.CREATE LOADING JOB syntax
----
CREATE LOADING JOB <job_name> FOR GRAPH <graph_name> {
Expand Down
14 changes: 11 additions & 3 deletions modules/data-loading/partials/kafka/kafka-data-source-details.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ To configure the data source object, the minimum requirement is the address of t
.Data source configuration for external Kafka
----
{
"type": "mirrormaker",
"source.cluster.bootstrap.servers": "<broker_addrs>"
"type": "mirrormaker",
"source.cluster.bootstrap.servers": "<broker_addrs>"
}
----

Expand All @@ -25,9 +25,17 @@ If the source cluster is configured for SSL or SASL protocols, you need to provi
* If the source cluster uses SASL *and* SSL, you need to upload the keytab of each Kerberos principal, as well as the key store and truststore to every node of your TigerGraph cluster.
Each file must be at the same absolute path on all nodes.

The following configurations are required for admin, producer and consumer. To supply the configuration for the corresponding component, replace `<prefix>` with `source.admin`, `producer`, or `consumer`.
The following configurations are required for admin, producer and consumer. Basically Kafka allows SSL settings overriding, it respects security settings in precedence order: generic.ssl.setting < source/target.cluster.ssl.setting < admin/producer/consumer.ssl.setting.

If both source and target clusters are sharing the same SSL settings, user can set generic settings for both source/target clusters and all the rols(admin/producer/consumer). For example, user can set "ssl.keystore.location=/path/to/key/store" instead of "source.cluster.ssl.keystore.location=/path/to/key/store", or "admin.ssl.keystore.location=/path/to/key/store", or even "source.cluster.admin.ssl.keystore.location=/path/to/key/store".

If source and target clusters have different SSL settings, to make things simple, users can simply set cluster wide SSL configs, e.g., "target.cluster.ssl.truststore.password=/password/for/trust/store", instead of "target.cluster.producer.ssl.trust.password=/password/for/trust/store".

To supply the configuration for the corresponding component, replace `<prefix>` with `source(/or target).cluster`, `source(or target).cluster.admin(or producer, consumer)`, `admin`, `producer`, or `consumer`.
For example, to specify `GSSAPI` as the SASL mechanism for consumer, include `"consumer.sasl.mecahnism": "GSSAPI"` in the data source configuration.

Note: SSL is now well supported by TigerGraph, we recommend users to set up regular SSL rather than SASL + PlainText/SSL.

[%header,cols="1,2"]
|===
| Field | Description
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

The following is an example loading job from and external Kafka cluster.

[source,php,linenums]
.Example loading job for BigQuery
[source,gsql,linenums]
.Example loading job from external Kafka
----
USE GRAPH ldbc_snb
CREATE DATA_SOURCE s1 = "ds_config.json" FOR GRAPH ldbc_snb
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ We will call out whether a particular step is common for all loading or specific
== Example Schema
This example uses part of the LDBC_SNB schema:

[source,php]
[source,gsql]
.Example schema taken from LDBC_SNB
----
//Vertex Types:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ Inline mode is required when creating data sources for TigerGraph Cloud instance

In the following example, we create a data source named `s1`, and read its configuration information from a file called `ds_config.json`.

[source,php]
[source,gsql]
USE GRAPH ldbc_snb
CREATE DATA_SOURCE s1 = "ds_config.json" FOR GRAPH ldbc_snb

Older versions of TigerGraph required a keyword after `DATA_SOURCE` such as `STREAM` or `KAFKA`.

[source,php]
[source,gsql]
.Inline JSON data format when creating a data source
CREATE DATA_SOURCE s1 = "{
type: <type>,
Expand All @@ -24,7 +24,7 @@ key: <value>
String literals can be enclosed with a double quote `"`, triple double quotes `"""`, or triple single quotes `'''`.
Double quotes `"` in the JSON can be omitted if the key name does not contain a colon `:` or comma `,`.

[source,php]
[source,gsql]
.Alternate quote syntax for inline JSON data
CREATE DATA_SOURCE s1 = """{
"type": "<type>",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ These can refer to actual files or be placeholder names. The actual data sources
. LOAD statements specify how to take the data fields from files to construct vertices or edges.

////
[source,php]
[source,gsql]
.CREATE LOADING JOB syntax
----
CREATE LOADING JOB <job_name> FOR GRAPH <graph_name> {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ First we define _filenames_, which are local variables referring to data files (
[NOTE]
The terms `FILENAME` and `filevar` are used for legacy reasons, but a `filevar` can also be an object in a data object store.

[source,php]
[source,gsql]
.DEFINE FILENAME syntax
----
DEFINE FILENAME filevar ["=" file_descriptor ];
Expand All @@ -13,7 +13,7 @@ DEFINE FILENAME filevar ["=" file_descriptor ];
The file descriptor can be specified at compile time or at runtime.
Runtime settings override compile-time settings:

[source,php]
[source,gsql]
.Specifying file descriptor at runtime
----
RUN LOADING JOB job_name USING filevar=file_descriptor_override
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
=== Specify the data mapping
Next, we use LOAD statements to describe how the incoming data will be loaded to attributes of vertices and edges. Each LOAD statement handles the data mapping, and optional data transformation and filtering, from one filename to one or more vertex and edge types.

[source,php]
[source,gsql]
.LOAD statement syntax
----
LOAD [ source_object|filevar|TEMP_TABLE table_name ]
Expand All @@ -12,7 +12,7 @@ LOAD [ source_object|filevar|TEMP_TABLE table_name ]
<1> As of v3.9.3, TAGS are deprecated.

Let's break down one of the LOAD statements in our example:
[source,php]
[source,gsql]
.Example loading job for local files
----
LOAD file_Person TO VERTEX Person
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
When a loading job starts, the GSQL server assigns it a job ID and displays it for the user to see.
There are three key commands to monitor and manage loading jobs:

[source,php]
[source,gsql]
----
SHOW LOADING STATUS job_id|ALL
ABORT LOADING JOB job_id|ALL
Expand All @@ -12,7 +12,7 @@ RESUME LOADING JOB job_id

`SHOW LOADING STATUS` shows the current status of either a specified loading job or all current jobs, this command should be within the scope of a graph:

[source,php]
[source,gsql]
GSQL > USE GRAPH graph_name
GSQL > SHOW LOADING STATUS ALL

Expand Down