Building a Real-Time Streaming Pipeline

This small project focuses on the use of Apache Kafka and Python to learn more on event streaming.

Overview

In this project, we will build a real-time streaming pipeline using Apache Kafka and Python. The pipeline will enable the ingestion and processing of samples empowering us to perform dynamic analyses such as sentiment analysis.

Steps followed

Setup in our PATH (e.g. in the .bashrc)

export KAFKA_HOME=/path/to/kafka export PATH=$KAFKA_HOME/bin:$PATH


### 1. Start the Kafka environment
```bash
# Start ZooKeeper
apache_kafka_folder/bin/zookeeper-server-start.sh apache_kafka_folder/config/zookeeper.properties

# Start Kafka
apache_kafka_folder/bin/kafka-server-start.sh apache_kafka_folder/config/server.properties

I use the Python library faker to generate fake data.

3. Create a Kafka topic to store events

apache_kafka_folder/bin/kafka-topics.sh --create --topic social_media_stream --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

4. Implement the Kafka Producer and Consumer

Producer: those client applications that publish (write) events to Kafka.
Consumer: those that subscribe to (read and process) these events.

Python library confluent-kafka is used to create basic clients.

5. Run Python files

First, generate the social media data.
Simultaneously, the Kafka scripts in order to see the data ingestion and processing in real-time.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
data.py		data.py
kafka_consumer.py		kafka_consumer.py
kafka_producer.py		kafka_producer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building a Real-Time Streaming Pipeline

Overview

Steps followed

Setup in our PATH (e.g. in the .bashrc)

3. Create a Kafka topic to store events

4. Implement the Kafka Producer and Consumer

5. Run Python files

About

Releases

Packages

Languages

praxton74/Real-Time-streaming

Folders and files

Latest commit

History

Repository files navigation

Building a Real-Time Streaming Pipeline

Overview

Steps followed

Setup in our PATH (e.g. in the .bashrc)

3. Create a Kafka topic to store events

4. Implement the Kafka Producer and Consumer

5. Run Python files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages