-
Notifications
You must be signed in to change notification settings - Fork 32
Set up Khermes
Once you have a khermes cluster up & running (see the Getting started section to see how to do so) it's time to set up your khermes cluster to start producing data.
Khermes need four components to start producing data:
- Kafka cluster: the kafka cluster that khermes will get connected to
- Twirl template: it will define the random data that khermes will produce
- Khermes configuration: it will define the way that khermes will produce data.
- Avro configuration (optional): if you want to redirect the data produced by khermes to a kafka connector you will need to define the avro configuration that will be stored in the schema registry. If you just want to consume the data produced by khermes with a kafka consumer you DO NOT need to define this configuration
These configurations will be stored in zookeeper, this way, you will be able to re-use them the next time you start your khermes cluster.
To set-up all these components we have created a handy web console to easily interact with your khermes cluster. To access the console go to the following url using your favourite browser:
http://<ip_where_khermes_is_running>:8080/console
You should see a console like this:
Try typing help to get the list of available commands!!
So let's start configuring the kafka configuration, to do so type the following command within the khermes web console:
khermes> create kafka-config
khermes> kafka-config> Please introduce the kafka-config name>
You should introduce a name for the kafka config that will be stored in zookeeper. And after that the console will ask you for a kafka configuration:
khermes> kafka-config> Please introduce the kafka-config>
Following is the simplest kafka configuration example:
kafka {
bootstrap.servers="localhost:9092"
key.serializer = "org.apache.kafka.common.serialization.StringSerializer"
value.serializer = "org.apache.kafka.common.serialization.StringSerializer"
}
If you will connect kafka with connectors to redirect the data generated by khermes you will need to both define the avro configuration, see avro config section, and use the avro serializer. To do so please use a kafka configuration like the following:
kafka {
bootstrap.servers="localhost:9092"
key.serializer = "io.confluent.kafka.serializers.KafkaAvroSerializer"
value.serializer = "io.confluent.kafka.serializers.KafkaAvroSerializer"
schema.registry.url = "http://localhost:8081"
}
Values are self-explanatory. To see a list of all configuration values available please see the kafka producer configuration official documentation: Kafka producer config
Now it's time to configure the twirl template:
khermes> create twirl-template
khermes> twirl-template> Please introduce the twirl-template name> t1
And then copy and paste the following twirl template example:
@import scala.util.Random
@import com.stratio.khermes.helpers.faker.Faker
@import com.stratio.khermes.helpers.faker.generators.Positive
@(faker: Faker)
@defining(faker.Geo.geolocation, faker.Music.playedSong) { case (randomGeo, randomSong) =>
{
"song": "@(randomSong.song)",
"artist": "@(randomSong.artist)",
"album": "@(randomSong.album)",
"genre": "@(randomSong.genre)",
"playduration": @(faker.Number.number(3,Positive)),
"rating": @(faker.Number.rating(5)),
"user": "@(faker.Name.fullName)",
"usertype": "@(Seq("free", "membership")(Random.nextInt(2)))",
"city": "@(randomGeo.city)",
"location": "@(randomGeo.latitude),@(randomGeo.longitude)",
"starttime": "@(s"${Random.nextInt(24)}:${Random.nextInt(60)}:${Random.nextInt(60)}.${Random.nextInt(1000)}")"
}
}
As you can see in the template you firstly should import the classes that you will need within the template. Then you will define a json-like structure where you define how your fields will be called and the generator that you want to use for a specific field, check out the Data generators section to find out more. You are also free to include Scala code to generate your own in-line generators (see the usertype and startime fields of the example template).
To find out more about twirl templates check out the twirl github project.
Let's configure now the khermes generator. Type in the following in the web console:
khermes> create generator-config
khermes> generator-config> Please introduce the generator-config name> g1
And then copy and paste the following configuration:
khermes {
templates-path = "/tmp/khermes/templates"
topic = "khermes"
template-name = "khermestemplate"
i18n = "EN"
timeout-rules {
number-of-events: 10
duration: 5 seconds
}
stop-rules {
number-of-events: 5000
}
}
As you can see in the example you should define several fields:
- templates-path: The path where you templates will get compiled
- topic: The kafka topic where the random data will be generated
- template-name: The name for the compiled template
- i18n: This will define the locale that khermes will use to produce random data. So far we have implemented to locales (EN: English, ES: Spanish). If you configure khermes with EN locale then it will, for instance, produce English names like (John, Mark...) whereas if you set-up the ES locale it will produce Spanish names like (Juan, Antonio...)
- timeout-rules (optional): If you set the number-of-events to 10 and the duration to 5 seconds it means that khermes will stop producing data for 5 seconds each 10 events.
- stop-rules (optional): If you set the number-of-events field to 5000 it means that khermes will stop producing data after producing 5000 events.
As said previously this configuration is optional. To set up the avro configuration please type the following commands in the web console:
khermes> create avro-config
khermes> avro-config> Please introduce the avro-config name> a1
And copy and paste the following avro configuration:
{
"type": "record",
"name": "khermes",
"fields": [{"name": "song","type": "string"},
{"name": "artist","type": "string"},
{"name": "album","type": "string"},
{"name": "genre","type": "string"},
{"name": "playduration","type": "int"},
{"name": "rating","type": "int"},
{"name": "user","type": "string"},
{"name": "usertype","type": "string"},
{"name": "city","type": "string"},
{"name": "location","type": "string"},
{"name": "starttime","type": "string"}]
}
To find out further information about avro do please check out the official documentation.
We are set to start producing data!! To do so, let's check out the status of the cluster by typing the ls command in the webconsole. You should see an output like the following:
khermes> ls
Node Id | Status
-------------------------------------------------
180ce7c2-b113-4450-ab66-16d39cebe620 | false
13167b26-08c6-4b17-a751-d7fab8ae0877 | false
8527069d-c1ea-4c67-bdfb-cdb1965bb2fa | false
a37144da-757c-4bf7-aadf-2580dcb26f4c | false
These are your khermes cluster nodes. As you can see all the nodes are in a "false" status it means that none of them are currently producing data. Let's start producing data in the first one, to do so, just type start in the web console and afterwards type the names of the twirl-template, kafka configuration, generator config, avro config and finally the first node id as below:
khermes> start
khermes> start > Please introduce the twirl-template name> t1
khermes> start > Please introduce the kafka-config name> k1
khermes> start > Please introduce the generator-config name> g1
khermes> start > Please introduce the avro-config name> a1
khermes> start > Please introduce the node-ids> 180ce7c2-b113-4450-ab66-16d39cebe620
Command result: OK
If you type in ls again you should see how the first node has started to produce data:
khermes> ls
Node Id | Status
-------------------------------------------------
180ce7c2-b113-4450-ab66-16d39cebe620 | true
a37144da-757c-4bf7-aadf-2580dcb26f4c | false
13167b26-08c6-4b17-a751-d7fab8ae0877 | false
8527069d-c1ea-4c67-bdfb-cdb1965bb2fa | false
Khermes - An open source distributed data generator