- Project 2 is a Spark application that processes COVID data and using Zeppelin (or Tableau or other visualization software for graphics and visuals) for showing trends and data analysis.
- Create a Spark Application that processes COVID data
- Involves some analysis of COVID data using 10 queries
- Produce one or more .jar files for the analysis. Then run application using spark-submit
- Find a trend
- Implement logging (with Spark)
- Use Zeppelin (or Tableau or other visualization software for graphics and visuals) for showing trends and data analysis
- Implement Agile Scrum methodology for project work
- JIra Software for task managment
- Utilize Apache Airflow for workflow scheduling
- Exporting results
- Encrypting password
- Apache Spark
- Spark SQL
- YARN
- HDFS and/or S3
- SBT
- Scala 2.12 (or 2.13)
- Git + GitHub
- Zeppelin (or Tableau or other visualization software)
- Plan
- Design
- Code
- Test
- Deploy
- IntelliJ IDEA 2022.1.2
- IDE Used: IntelliJ IDEA 2022.1.2
- Scala Package Used: Winutils
- Visualization Programs: Apache Zeppelin
- Oscar Garcia
- Jordi Icetch
- Joseph Kim
- Thuvarakan Nakarajah
- Edwin Castano