Skip to content

Commit

Permalink
review for version 0.7
Browse files Browse the repository at this point in the history
  • Loading branch information
tfarcas committed Nov 13, 2023
1 parent 91388f4 commit 2cfa029
Show file tree
Hide file tree
Showing 10 changed files with 86 additions and 46 deletions.
3 changes: 3 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions .idea/shoe-store.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 17 additions & 13 deletions lab1.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
![image](terraform/img/confluent-logo-300-2.png)
# Lab 1

All required resources must be already created for this lab to work correctly.
All required resources must be already created for this lab to work correctly. If you haven't already, follow the [prerequisites](prereq.md).

## Verify Confluent Cloud Resources
Let's verify if all resources were created correctly and we can start using them.
Expand All @@ -23,11 +23,11 @@ NOTE: Schema Registry is at the Environment level and can be used for multiple K
### Datagen Connectors
Your Kafka cluster should have three Datagen Source Connectors running. Check if topic and template configurations are correct.

| Connector Name (can be anything)| Topic | Format | Template |
| --------------------------- |:-------------:| -----:|----------------------:|
| **DatagenSourceConnector_0**| shoe_products | AVRO | **Shoes** |
| **DatagenSourceConnector_1**| shoe_customers | AVRO | **Shoes customers** |
| **DatagenSourceConnector_2**| shoe_orders | AVRO | **Shoes orders** |
| Connector Name (can be anything) | Topic | Format | Template |
|--------------------------------------|:---------------:|-------:|---------------------:|
| **DatagenSourceConnector_products** | shoe_products | AVRO | **Shoes** |
| **DatagenSourceConnector_customers** | shoe_customers | AVRO | **Shoes customers** |
| **DatagenSourceConnector_orders** | shoe_orders | AVRO | **Shoes orders** |

### Flink Compute Pool

Expand All @@ -52,11 +52,11 @@ Let's start with exploring our Flink tables.
Kafka topics and schemas are always in sync with our Flink cluster. Any topic created in Kafka is visible directly as a table in Flink, and any table created in Flink is visible as a topic in Kafka. Effectively, Flink provides a SQL interface on top of Confluent Cloud.

Following mapping exist:
| Kafka| Flink |
| -------------- |:-------------:|
| Environment | Catalog |
| Cluster | Database |
| Topic + Schema | Table |
| Kafka | Flink |
| ------------ |:---------:|
| Environment | Catalog |
| Cluster | Database |
| Topic + Schema | Table |

We will now work with SQL Worksheet:
![image](terraform/img/sql_worksheet.png)
Expand All @@ -74,6 +74,10 @@ SHOW TABLES;
```
Do you see tables shoe_products, shoe_customers, shoe_orders?

You can add multiple query boxes by clicking the + button on the left of it

![image](terraform/img/add-query-box.png)

Understand how was the table created
```
SHOW CREATE TABLE shoe_products;
Expand Down Expand Up @@ -141,7 +145,7 @@ GROUP BY brand;
### Time Windows

Let's try Flink time windowing functions for shoe order records.
Column names “window_start” and “window_end” are comminly used in Flink's window operations, especially when dealing with event time windows.
Column names “window_start” and “window_end” are commonly used in Flink's window operations, especially when dealing with event time windows.

Find amount of orders for 1 minute intervals (tumbling window aggregation).
```
Expand Down Expand Up @@ -239,4 +243,4 @@ SELECT *
Now, you can finally check with jobs are still running, which jobs failed, and which stopped. Go to Flink (Preview) in environments and choose `Flink Statements`. Check what you can do here.
![image](terraform/img/flink_jobs.png)

End of Lab1 got [lab2](lab2.md)
End of Lab1, continue with [Lab2](lab2.md).
9 changes: 6 additions & 3 deletions lab2.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
![image](terraform/img/confluent-logo-300-2.png)
# Lab 2
Finishing Lab 1 is required for this lab.
Finishing Lab 1 is required for this lab. If you have not completed it, go to [LAB 1](lab1.md).

## Flink Joins

Flink SQL supports complex and flexible join operations over dynamic tables. There are a number of different types of joins to account for the wide variety of semantics that queries may require.
By default, the order of joins is not optimized. Tables are joined in the order in which they are specified in the FROM clause.

### Understand Timestamps
Let's first look at our data records and their timestamps.
Let's first look at our data records and their timestamps. Open Flink SQL workspace.

If you left the Flink SQL Workspace or refreshed the page, Catalog and Database dropdowns are reset. Make sure they are selected again.
![image](terraform/img/catalog-and-database-dropdown.png)

Find all customer records for one customer and display timestamps when events were ingested in the shoe_customers Kafka topic
```
Expand Down Expand Up @@ -299,4 +302,4 @@ Check if all promotion notifications are stored correctly
select * from shoe_promotions;
```

End of Lab 2
End of Lab2.
61 changes: 31 additions & 30 deletions prereq.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
![image](terraform/img/confluent-logo-300-2.png)
# Prerequisites
Before we can run the hands-on workshop, a working infrastructure in Confluent Cloud mus exists:
Before we can run the hands-on workshop, a working infrastructure in Confluent Cloud must exist:
- an environment with Schema Registry enabled,
- a Kafka Cluster,
- 3 topics
- events generated by our datagen testdata generator connector
- events generated by our Sample Data Datagen Source connector
- and finally a Flink SQL compute pool.

And of course to do all of this we need a working account for Confluent Cloud.
And of course, to do all of this we need a working account for Confluent Cloud.
Sign-up with Confluent Cloud is very easy and you will get a $400 budget for our Hands-on Workshop.
If you don't have a working Confluent Clooud account please [Sign-up to Confluent Cloud](https://www.confluent.io/confluent-cloud/tryfree/?utm_campaign=tm.campaigns_cd.Q124_EMEA_Stream-Processing-Essentials&utm_source=marketo&utm_medium=workshop).
If you don't have a working Confluent Cloud account please [Sign-up to Confluent Cloud](https://www.confluent.io/confluent-cloud/tryfree/?utm_campaign=tm.campaigns_cd.Q124_EMEA_Stream-Processing-Essentials&utm_source=marketo&utm_medium=workshop).

Now you have two possibilties to create the Hands-On Workshop Confluent Cloud resources:
1. Let terraform create it: If you are comfortable with running terraform, then follow this [guide](terraform/README.md)
Now you have two possibilities to create the Hands-On Workshop Confluent Cloud resources:
1. Let terraform create it: If you are comfortable running terraform, then follow this [guide](terraform/README.md).
2. Create all resources manually.

## Confluent Cloud Resources for the Hands-on Workshop: Manual Setup
Expand All @@ -21,58 +21,59 @@ Both are using the confluent cloud API in background.
If you would like to use the cli, you need to [install the cli](https://docs.confluent.io/confluent-cli/current/install.html) on your desktop. This workshop guide will cover the GUI only.

### Create Environment and Schema Registry
Login into Confluent Cloud and create am environment with Schema Registry:
Login into Confluent Cloud and create an environment with Schema Registry:
* Click `Add cloud environment` button
* enter a New environment name e.g. `handson-flink` and push `create` button
* Choose Essentials Stream Governance package and click `begin configuration`
* Choose AWS with region eu-central-1 (Frankfurt), currently flink SQL (Preview) is only available in AWS, but will be soon available for all azure and google regions.
* Enter a New environment name e.g. `handson-flink` and push `create` button
* Choose Essentials Stream Governance package and click `Begin configuration`
* Choose AWS with region eu-central-1 (Frankfurt), currently flink SQL (Preview) is only available in AWS, but will be soon available for all Azure and Google regions.
* Click button `Enable`

Environment is ready to work and includes a schema registray in AWS in region Frankfurt.
Environment is ready to work and includes a Schema Registry in AWS in region Frankfurt.
![image](terraform/img/environment.png)

### Create Kafka Cluster in Environment `handson-flink`

The next step is to create a Basic Cluster in AWS region eu-central-1.
Click button `create cluster`
* choose BASIC `begin configuration` button to start the cluster creation config.
Click button `Create cluster`
* choose BASIC `Begin configuration` button to start the cluster creation config.
* Choose AWS and the region eu-central-1 with Single zone and click `Continue`
* Give the cluster a name , e.g. `cc_handson_cluster` and check rate card overview and configs, then press `Launch cluster`

The cluster will be up and running in seconds.
![image](terraform/img/cluster.png)

### Create topics in Kafka Cluster `cc_handson_cluster`
Now, we need three topics, to store our events.
Now, we need three topics to store our events.
* shoe_products
* shoe_customers
* shoe_orders

Via the GUI the topic creation is very simple.
Create Topic by clicking (left.hand menu) Topics and then click `Create Topic` button
* Topic name : shoe_products, Partitions : 1 and then push `Create with defaults` button
* Do the same for shoe_customers and shoe_orders
Create Topic by clicking (left.hand menu) Topics and then click `Create topic` button.
* Topic name : shoe_products, Partitions : 1 and then click `Create with defaults` button
* Repeat the same steps for shoe_customers and shoe_orders

Three topics are created.
![image](terraform/img/topics.png)

### Create Datagenerator connectors to fill the topics `show_products and shoe_customers and shoe_orders`
Confluent has the datagen connector, which is a testdata generator. In Confluent Cloud a couple Quickstarts (predefinied data) are available and will generate data of a given format.
### Create Sample Data connectors to fill the topics `show_products` and `shoe_customers` and `shoe_orders`
Confluent has the Datagen connector, which is a testdata generator. In Confluent Cloud a couple Quickstarts (predefinied data) are available and will generate data of a given format.
NOTE: We use Datagen with following templates:
* Shoe Products https://github.com/confluentinc/kafka-connect-datagen/blob/master/src/main/resources/shoes.avro
* Shoe Customers https://github.com/confluentinc/kafka-connect-datagen/blob/master/src/main/resources/shoe_customers.avro
* Shoe Orders https://github.com/confluentinc/kafka-connect-datagen/blob/master/src/main/resources/shoe_orders.avro

Choose the `Connectors` menu entry (left site) and search for `Sample Data`. Click on the Sample Data Icon.
* Choose topic: `show_products` and click `continue`
* Click Global Access (which is already selected by default) and download the API Key. Typically you will give the connector restrictive access to your resources (what we did in the terraform setup). But for now, it seems to be good enough for hands-on. Click `Generate API Key & Download`, enter a description `Datagen Connector Products` abd click `continue`
* Select format `AVRO`, because Flink requires AVRO for now, and a template (Show more Option) `Shoes` and click `continue`
* Check Summary, we will go with one Task (slider) and click `continue`
* Enter a name `DSoC_products` and finally click `continue`
* Choose topic: `shoe_products` and click `Continue`
* Click Global Access (which is already selected by default) and download the API Key. Typically, you will give the connector restrictive access to your resources (what we did in the terraform setup). But for now, it seems to be good enough for hands-on. Click `Generate API Key & Download`, enter a description `Datagen Connector Products` abd click `continue`
* Select format `AVRO`, because Flink requires AVRO for now, and a template (Show more Option) `Shoes` and click `Continue`
* Check Summary, we will go with one Task (slider) and click `Continue`
* Enter a name `DSoC_products` and finally click `Continue`

Now, events will fly in topic `shoe_products` generated from datagen connector `DSoC_products`
![image](terraform/img/shoe_products.png)

If you click on `Stream Lineage` (left side) and will see your current data pipeline. Click on topic `shoe_products` and enter the description `Shoe products`. This is how you place metedata to your data product.
If you click on `Stream Lineage` (left side) and will see your current data pipeline. Click on topic `shoe_products` and enter the description `Shoe products`. This is how you place metadata to your data product.
![image](terraform/img/streamlineage.png)

Go back to your Cluster `cc_handson_cluster` and create two more datagen connectors to fill the topics shoe_customers and shoe_orders, go to `Connectors` and click `Add Connector`
Expand All @@ -98,11 +99,11 @@ Or just use the topic viewer, where you can
Go back to environment `handson-flink` and choose `Flink (preview)` Tab. From there we create a new compute pool:
* choose AWS region eu-central-1, click `continue` and
* enter Pool Name: `cc_flink_compute_pool` with 5 Confluent Flink Units (CFU) and
* click `continue` button and then `finish`.
The pool will be provisioned and ready to work with a couple of moments.
![image](terraform/img/flink_pool.png)
* click `Continue` button and then `Finish`.
The pool will be provisioned and ready to use in a couple of moments.
![image](terraform/img/flinkpool.png)

open the SQLWorksheet of compute pool and set:
Open the SQL Workspace of compute pool and set:
- the environment name `handson-flink` as catalog
- and the cluster name `cc_handson_cluster` as database
Via the dropdown boxes, see graphic
Expand All @@ -111,4 +112,4 @@ Via the dropdown boxes, see graphic
The infrastructure for the Hands-on Workshop is up and running. And we can now start to develop our use case of a loyalty program in Flink SQL.
![image](terraform/img/deployment_diagram.png)

End of prereq, start with [LAB 1](lab1.md)
End of prerequisites, continue with [LAB 1](lab1.md).
Binary file added terraform/img/add-query-box.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added terraform/img/catalog-and-database-dropdown.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 2cfa029

Please sign in to comment.