Skip to content

Latest commit

 

History

History
109 lines (79 loc) · 6.73 KB

prereq.md

File metadata and controls

109 lines (79 loc) · 6.73 KB

image

Prerequisites

Before we can run the hands-on workshop, a working infrastructure in Confluent Cloud must exist:

  • an environment with Schema Registry enabled
  • a Kafka Cluster
  • 3 topics
  • events generated by our Sample Data Datagen Source connector

I have recorded short video to show you how are the required resources created:

Now you have two possibilities to create the Hands-On Workshop Confluent Cloud resources:

  1. Let terraform create it: If you are comfortable running terraform, then follow this guide.
  2. Create all resources manually.

Confluent Cloud Resources for the Hands-on Workshop: Manual Setup

Please be aware that the cluster and the Flink Pool need to be in the same Cloud-Provider-Region.

You can create each Confluent Cloud resource with the confluent cli tool and/or Confluent Cloud Control Plane GUI. Both are using the confluent cloud API in background. If you would like to use the cli, you need to install the cli on your desktop. This workshop guide will cover the GUI only.

Create Environment and Schema Registry

Login into Confluent Cloud and create an environment with Schema Registry:

  • Click Add cloud environment button

  • Enter a New environment name e.g. handson-flink and push create button

  • Choose Essentials Stream Governance package and click Begin configuration

    • Choose AWS with region eu-central-1 (Compare rule with your lastname, see above), currently flink SQL (Preview) is only available in AWS, but will be soon available for all Azure and Google regions.
    • Click button Enable

Environment is ready to work and includes a Schema Registry in the AWS region. image

Create Kafka Cluster in Environment handson-flink

The next step is to create a Basic Cluster in AWS region based on the rule above (Compare rule with your lastname, see above). Click button Create cluster

  • choose BASIC Begin configuration button to start the cluster creation config.
  • Choose AWS and the region with Single zone (For the region, select one of the supported region where Flink Compute Pools are available: review this up to date list ) and click Continue.
  • Give the cluster a name , e.g. cc_handson_cluster and check rate card overview and configs, then press Launch cluster

The cluster will be up and running in seconds. image

Create topics in Kafka Cluster cc_handson_cluster

Now, we need three topics to store our events.

  • shoe_products
  • shoe_customers
  • shoe_orders

Via the GUI the topic creation is very simple. Create Topic by clicking (left.hand menu) Topics and then click Create topic button.

  • Topic name : shoe_products, Partitions : 1 and then click Create with defaults button
  • Repeat the same steps for shoe_customers and shoe_orders

Three topics are created. image

Create Sample Data connectors to fill the topics show_products and shoe_customers and shoe_orders

Confluent has the Datagen connector, which is a testdata generator. In Confluent Cloud a couple Quickstarts (predefinied data) are available and will generate data of a given format. NOTE: We use Datagen with following templates:

Choose the Connectors menu entry (left site) and search for Sample Data. Click on the Sample Data Icon.

  • Under "Additional configuration" choose topic: shoe_products and click Continue
  • Click Global Access (which is already selected by default) and download the API Key. Typically, you will give the connector restrictive access to your resources (what we did in the terraform setup). But for now, it seems to be good enough for hands-on. Click Generate API Key & Download, enter a description Datagen Connector Products abd click continue
  • Select format AVRO, because Flink requires AVRO for now, and a template (Show more Option) Shoes and click Continue
  • Check Summary, we will go with one Task (slider) and click Continue
  • Enter a name DSoC_products and finally click Continue

Now, events will fly in topic shoe_products generated from datagen connector DSoC_products image

If you click on Stream Lineage (left side) and will see your current data pipeline. Click on topic shoe_products and enter the description Shoe products. This is how you place metadata to your data product. image

Go back to your Cluster cc_handson_cluster and create two more datagen connectors to fill the topics shoe_customers and shoe_orders, go to Connectors and click Add Connector. Pay attention when you select the template for the datagen connector and ensure, that it corresponds with the before selected topic as shown in the following. Deviations in this step will result in invalid queries at later stages in the workshop.

  • Connector Plug-in Sample Data, Topic shoe_customers, Global Access amd Download API Key with Description Datagen Connector Customers, Format AVRO, template Shoe customers, 1 Task, Connector Name DSoC_customers
  • Connector Plug-in Sample Data, Topic shoe_orders, Global Access amd Download API Key with Description Datagen Connector Orders, Format AVRO, template Shoe orders, 1 Task, Connector Name DSoC_orders

Three Connectors are up and running and are generating data for us. image

What is really pretty cool, is that all three connectors are generating events in AVRO format and created automatically a schema for all three topics. You can have a look for the schema in the Schema Registry. image

Or just use the topic viewer, where you can

  • view the events flying in
  • all meta data information
  • configs
  • and schemas as well

image

The preapration is finished. Well done.

The infrastructure for the Hands-on Workshop is up and running. And we can now start to develop our use case of a loyalty program in Flink SQL. image

End of prerequisites, continue with LAB 1.