Big Data-Apache-Project

A repository for a team project with Big Data.

The members of the "ProblemPlanProblems" team

Przemysław Chojecki
Paweł Morgen
Paulina Przybyłek

Project subject

Design and implementation of a data storage tool on press articles and analysis of their headlines.

The goal of the project

The project will focus on performance and the solution will be designed with expansiveness in mind. Implemented solution will be highly scalable and will be able to process a high volume of data.

Technology stack

The project will be the flow of data from Free News API and Twitter API. The data will be acquired and preprocessed by Apache NiFi (including fusion of APIs). Raw and preprocessed data will be stored in HDFS. When the appropriate amount of data will be collected, the data will be batch processed by Apache Spark, and the results will be stored in Apache HBase.

Business plan

The project will store data about articles such as the title, summary, published_date, topic, twitter_account of the publisher (e.g. @nytimes) and data about the publisher's Twitter account such as localization, followers, number of followers. Also, the number of tweets about the article 24 hours after publishing will be stored.

We will compare the sentiment of a summary with the amount of the tweets and/or topic and/or location of a publisher and/or number of Twitter followers.

This could provide meaningful information for authors about their audience and their's audience's preferences.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
NiFi templates		NiFi templates
documentation and presentation		documentation and presentation
example_back_up_files		example_back_up_files
spark_scripts		spark_scripts
table_creation_scripts		table_creation_scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data-Apache-Project

The members of the "ProblemPlanProblems" team

Project subject

The goal of the project

Technology stack

Business plan

About

Contributors 3

Languages

p-przybylek/BigData-Apache-Project

Folders and files

Latest commit

History

Repository files navigation

Big Data-Apache-Project

The members of the "ProblemPlanProblems" team

Project subject

The goal of the project

Technology stack

Business plan

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3

Languages