Twitter and Spark Streaming with Apache Kafka

This project counts tweets that include #GoTS7 hashtag per user in real-time.
Also, username and tweet counts are printed.

Code Explanation

Authentication operations were completed with Tweepy module of Python.
StreamListener named KafkaPushListener was create for Twitter Streaming. StreamListener produces data for Kafka Consumer.
Producing data was filtered about including Game of Thrones hashtag.
SparkContext was created to connect Spark Cluster.
Kafka Consumer that consumes data from 'twitter' topic was created.
Calculated how many tweets include #GotS7 hashtag per user and printed usernames and counts in real-time.

./kafka/kafka_2.11-0.11.0.0/bin/kafka-server-start.sh ./kafka/kafka_2.11-0.11.0.0/config/server.properties

PYSPARK_PYTHON=python3 bin/spark-submit kafka_push_listener.py

PYSPARK_PYTHON=python3 bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 kafka_twitter_spark_streaming.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
kafka_push_listener.py		kafka_push_listener.py
kafka_twitter_spark_streaming.py		kafka_twitter_spark_streaming.py
twitter_config.py		twitter_config.py