Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 1.28 KB

README.md

File metadata and controls

7 lines (4 loc) · 1.28 KB

Twitter-Kafka-Data-pipeline

As part of this project we intend to show the impact of the Twitter data and how it can play a pivotal role in transforming the future of any Organizations irrespective of their nature of work or output. We will be live streaming the data through advanced technologies such as Twitter API, Apache Kafka as well as searching tweets and storing them in NoSql databases such as MongoDB. We would then be making the inferences by plotting various graphs through Python using built-in packages such as Matplotlib, Seaborn, and Pandas among many. We would also be using popular visualization tools such as Tableau.

We would build a pipeline to live stream as well as search the Twitter data starting from extraction to visualization. We would form the user stories and resolve them through the inferences. The objective of this project also involves highlighting the challenges we face to accomplish our goal and the resolutions we adopt to solve them. We would also be noting down the errors we face, root cause analysis, unexpected challenges and many other aspects of this project.

The live streaming has been implemented using Apache Kafka. The Data moves from Twitter API to Kafka Producer and then to Kafka Consumer and from there we load to NoSQL database called MongoDB.