Skip to content

Latest commit

 

History

History
408 lines (322 loc) · 13.5 KB

README.adoc

File metadata and controls

408 lines (322 loc) · 13.5 KB

Pac-Man with Confluent Cloud

Pac-Man with Confluent Cloud is the funniest application that you ever see while applying principles of stream processing using Apache Kafka. Built around the famous Pac-Man game, this application allows you to capture and store events from the game into Kafka topics, as well as process them in near real-time using ksqlDB. In order to keep you focused on the fun and interesting part, the application is based on clusters running on Confluent Cloud — a fully managed service that offers Apache Kafka as a serverless application.

pacman game

To apply principles of stream processing in the game, you are going to build a scoreboard using KSQL. The scoreboard will be based on a table that holds aggregated metrics of the players such as their highest score, the highest level achieved, and the number of times that the player loses (a.k.a game-over). As the events keep coming from the game, this scoreboard gets updated instantly by the continuous queries that keep processing those events as they happen.

What you are going to need?

  • Confluent Cloud - You need to have an active account with Confluent Cloud to be able to spin up environments with the services required for this application. At a very minimum, you will need a Kafka cluster where your topics will be created and an managed Schema Registry. Optionally, you may want to create KSQL applications to implement the scoreboard pipeline.

  • Terraform - The application is automatically created using Terraform. The default cloud provider supported is AWS, but there are implementations for GCP and Azure as well. Besides having Terraform installed locally, will need to provide your cloud provider credentials so Terraform can create and manage the resources for you.

  • Java and Maven - The UI layer of the application relies on two APIs that are implemented using Java, therefore you will need to have Java installed to build the source-code. The build itseld is implemented using Maven, and it is triggered automatically by Terraform.

  • Docker (Optional) - The backend of some of the APIs is based on Redis. In order to keep Redis up-to-date with the data stored in Kafka, the application uses a sink service implemented as a Docker application. Its Docker image has been published on Docker Hub so there is no need for you to worry about this. However, if you want to use your own Docker image — you will need to build a new image using this code and push it to some Docker repository. Also, you will need to modify the Terraform variable named redis_sink_image.

  • Confluent Cloud CLI (Optional) - During the creation of the pipeline, if you choose to implement it using Confluent Cloud managed KSQL then you will need to have the Confluent Cloud CLI installed locally to set up access permissions to the topics. You can find instructions about how to install it here.

  • Confluent Platform (Optional) - The pipeline implementation has some example events that can be used to test the code before you deploy it. This can be acomplished by using the KSQL Testing Tool that is part of Confluent Platform. You can find instructions about how to install it here.

1) Setting up Confluent Cloud

As mentioned before, the application is based on clusters running on Confluent Cloud. Thus, the very first thing you need to do is creating a cluster in Confluent Cloud. You also going to need access to the Schema Registry service that is available for each environment created in Confluent Cloud.

2) Deploying the application

The application is essentially a set of HTML/CSS/JS files that forms a microsite that can be hosted statically anywhere. But for the sake of coolness — we will deploy this microsite in a storage service from the chosen cloud provider. A bucket will be created and the microsite will be copied there. This bucket will be created in the very same region selected for the Confluent Cloud cluster, to ensure that the application will be co-located. The application will emit events that will be processed by a event handler API implemented as a serverless application. This event handler API receives the events and writes them into the respective Kafka topics.

pac man arch

Please note that during deployment, the API takes care of creating the required Kafka topics. Therefore, there is no need to manually create them.

  1. Enter the folder that contains the AWS code

    cd terraform/aws
  2. Create a variables file for Confluent Cloud

    mv ccloud.auto.tfvars.example ccloud.auto.tfvars
  3. Provide the data on the 'ccloud.auto.tfvars' file

    bootstrap_server = "<CCLOUD_BOOTSTRA_SERVER>"
    cluster_api_key = "<CCLOUD_API_KEY>"
    cluster_api_secret = "<CCLOUD_API_SECRET>"
    
    schema_registry_url = "<SCHEMA_REGISTRY_URL>"
    schema_registry_basic_auth = "<SCHEMA_REGISTRY_API_KEY>:<SCHEMA_REGISTRY_SECRET>"
  4. Create a variables file for AWS

    mv cloud.auto.tfvars.example cloud.auto.tfvars
  5. Provide the credentials on the 'cloud.auto.tfvars' file

    aws_access_key = "<AWS_ACCESS_KEY>"
    aws_secret_key = "<AWS_SECRET_KEY>"
  6. Initialize the Terraform plugins

    terraform init
  7. Start the application deployment

    terraform apply -auto-approve
  8. Output with endpoints will be shown

    Outputs:
    
    KSQL_Server = http://pacman00000-ksql-000000.region.elb.amazonaws.com
    Pacman = http://pacman000000000000000.s3-website-region.amazonaws.com

Note: When you are done with the application, you can automatically destroy all the resources created by Terraform using the command below:

terraform destroy -auto-approve

Deploying on GCP

  1. Enter the folder that contains the GCP code

    cd terraform/gcp
  2. Create a variables file for Confluent Cloud

    mv ccloud.auto.tfvars.example ccloud.auto.tfvars
  3. Provide the data on the 'ccloud.auto.tfvars' file

    bootstrap_server = "<CCLOUD_BOOTSTRA_SERVER>"
    cluster_api_key = "<CCLOUD_API_KEY>"
    cluster_api_secret = "<CCLOUD_API_SECRET>"
    
    schema_registry_url = "<SCHEMA_REGISTRY_URL>"
    schema_registry_basic_auth = "<SCHEMA_REGISTRY_API_KEY>:<SCHEMA_REGISTRY_SECRET>"
  4. Create a variables file for GCP

    mv cloud.auto.tfvars.example cloud.auto.tfvars
  5. Specify the GCP project name on the 'cloud.auto.tfvars' file

    gcp_credentials = "credentials.json"
    gcp_project = "<YOUR_GCP_PROJECT>"
  6. Create an service account key

    https://cloud.google.com/community/tutorials/getting-started-on-gcp-with-terraform
  7. Copy your service account key

    cp <source>/credentials.json .
  8. Initialize the Terraform plugins

    terraform init
  9. Start the application deployment

    terraform apply -auto-approve
  10. Output with endpoints will be shown

    Outputs:
    
    KSQL_Server = http://0.0.0.0
    Pacman = http://0.0.0.0

Note: When you are done with the application, you can automatically destroy all the resources created by Terraform using the command below:

terraform destroy -auto-approve

Deploying on Azure

  1. Enter the folder that contains the Azure code

    cd terraform/azr
  2. Create a variables file for Confluent Cloud

    mv ccloud.auto.tfvars.example ccloud.auto.tfvars
  3. Provide the data on the 'ccloud.auto.tfvars' file

    bootstrap_server = "<CCLOUD_BOOTSTRA_SERVER>"
    cluster_api_key = "<CCLOUD_API_KEY>"
    cluster_api_secret = "<CCLOUD_API_SECRET>"
    
    schema_registry_url = "<SCHEMA_REGISTRY_URL>"
    schema_registry_basic_auth = "<SCHEMA_REGISTRY_API_KEY>:<SCHEMA_REGISTRY_SECRET>"
  4. Create a variables file for Azure

    mv cloud.auto.tfvars.example cloud.auto.tfvars
  5. Provide the credentials on the 'cloud.auto.tfvars' file

    azure_subscription_id = "<AZURE_SUBSCRIPTION_ID>"
    azure_client_id = "<AZURE_CLIENT_ID>"
    azure_client_secret = "<AZURE_CLIENT_SECRET>"
    azure_tenant_id = "<AZURE_TENANT_ID>"
  6. Initialize the Terraform plugins

    terraform init
  7. Start the application deployment

    terraform apply -auto-approve
  8. Output with endpoints will be shown

    Outputs:
    
    KSQL_Server = http://pacman0000000-ksql.region.cloudapp.azure.com
    Pacman = http://pacman0000000000000000000.z5.web.core.windows.net

Note: When you are done with the application, you can automatically destroy all the resources created by Terraform using the command below:

terraform destroy -auto-approve

3) Creating the pipeline

When users play with the Pac-Man game — two types of events will be generated. The first one is called User Game and contains the data about the user’s current game, such as their score, current level, and the number of lives. The second one is called User Losses and, as the name implies, contains data about the number of times the user loses the game. To build a scoreboard out of this, a stream processing pipeline need to be implemented to perform a series of computations on these two events and derive a table that will contain statistic data about each user’s game.

pipeline

To implement the pipeline you will be using KSQL. The code for this pipeline has been written for you and the only thing you need to do is to deploy it into a full-fledged ksqlDB Server. Therefore, you need to decide which ksqlDB server you are going to use. There are two options:

  1. Using the ksqlDB server created by Terraform

  2. Using Confluent Cloud KSQL (Managed Service)

Whatever option you pick, the ksqlDB server will be pointing to the Kafka cluster running on Confluent Cloud. You can even mix and match options to showcase the fact that all options are handling data coming from the single-source-of-truth which is Apache Kafka.

Option: ksqlDB server created by Terraform

  1. Enter the folder that contains the AWS/GCP/Azure code

    cd terraform/<provider>
  2. Execute the command to print the outputs

    terraform output
  3. Select and copy the ksqlDB Server endpoint

  4. Enter the folder that contains the code

    cd ../../pipeline
  5. Start a new session of the KSQL CLI:

    ksql <ENDPOINT_COPIED_ON_STEP_THREE>
  6. Run the queries in the KSQL CLI session:

    RUN SCRIPT 'queries.sql';

Option: Confluent Cloud KSQL

  1. Access the Kafka cluster on Confluent Cloud

    select cluster
  2. Select the 'KSQL' tab and click on 'Add Application'

    new ksql app
  3. Name the KSQL application and click on 'Continue'

    name ksql app
  4. Confirm the terms and then click on 'Launch cluster'

  5. Log in into Confluent Cloud using the CCloud CLI

    ccloud login
  6. Within your environment, list your Kafka clusters

    ccloud kafka cluster list
  7. Select and copy the cluster id from the list

  8. Make sure your Kafka cluster is selected

    ccloud kafka cluster use <CLUSTER_ID_COPIED_ON_STEP_SEVEN>
  9. Find your KSQL application 'Id' using the CCloud CLI

    ccloud ksql app list
  10. Select and copy the KSQL application id from the list

  11. Set up read/write permissions to the Kafka topics

    ccloud ksql app configure-acls <KSQL_APP_ID_COPIED_ON_STEP_TEN> USER_GAME USER_LOSSES
  12. Within the KSQL application, copy the entire pipeline code in the editor

    create pipeline
  13. Click on 'Run' to create the pipeline

Appendix: Viewing the scoreboard locally

In order to verify if the pipeline is working as expected, you can execute a program written in Go that displays the content of the scoreboard. Because tables in KSQL ultimately are topics, this program subscribes to the SCOREBOARD topic and updates the display as new records arrive. Moreover, this program sorts the data based on each user’s game to simulate a real game scoreboard.

  1. Enter the folder that contains the code

    cd scoreboard
  2. Create a native executable for the program

    go build -o scoreboard scoreboard.go
  3. Execute the program to display the data

    ./scoreboard

Note: This program can only be executed after the application is deployed in the cloud provider. Reason being, to connect to Confluent Cloud this program relies on a file called 'ccloud.properties' that is generated by Terraform during deployment.