Skip to content

Latest commit

 

History

History
55 lines (47 loc) · 1.6 KB

README.md

File metadata and controls

55 lines (47 loc) · 1.6 KB

Project 2

  • Project 2 is a Spark application that processes COVID data and using Zeppelin (or Tableau or other visualization software for graphics and visuals) for showing trends and data analysis.

MVP:

  • Create a Spark Application that processes COVID data
  • Involves some analysis of COVID data using 10 queries
  • Produce one or more .jar files for the analysis. Then run application using spark-submit
  • Find a trend
  • Implement logging (with Spark)
  • Use Zeppelin (or Tableau or other visualization software for graphics and visuals) for showing trends and data analysis
  • Implement Agile Scrum methodology for project work
  • JIra Software for task managment

Stretch Goals:

  • Utilize Apache Airflow for workflow scheduling
  • Exporting results
  • Encrypting password

Tech Stack:

  • Apache Spark
  • Spark SQL
  • YARN
  • HDFS and/or S3
  • SBT
  • Scala 2.12 (or 2.13)
  • Git + GitHub
  • Zeppelin (or Tableau or other visualization software)

Development LifeCycle

  1. Plan
  2. Design
  3. Code
  4. Test
  5. Deploy

Project Management:

  • IntelliJ IDEA 2022.1.2

Generalizations

  • IDE Used: IntelliJ IDEA 2022.1.2
  • Scala Package Used: Winutils
  • Visualization Programs: Apache Zeppelin

Extra Data Used

Provisional COVID-19 Deaths by Sex and Age

Average monthly temperature in the U.S. from January 2019 to May 2022 (in Fahrenheit)

Collaborators:

  • Oscar Garcia
  • Jordi Icetch
  • Joseph Kim
  • Thuvarakan Nakarajah
  • Edwin Castano