Skip to content

Weekly meeting 23rd Sep

Yangjun Wang edited this page Sep 30, 2015 · 24 revisions

####Time: 2pm to 3 pm, 23/09/2015

Address:

A346, CS building

Prepared questions:

  1. When should we setup the test environment? The benchmark framework is not ready yet. We could write scripts to install softwares automatically. When it is ready to do benchmark, we could launch instances very quickly. It means that we donot need launch 10-20 small instances at this moment.
    Answer: Setup the environment to all node if possible

  2. Which real-time stream software we will benchmark and the corresponding version? For spark, do we benchmark Spark Standalone, Spark-Mesos or Spark-YARN?
    Answer: Stand alone

  3. Do we need optimize the real-time software configure? Answer: Optimize for our experiment cluster, such as configure to corresponding cores

  4. How do we design data model and workload model for our benchmark system?

  5. Which version of real time stream system will be benchmarked?
    Answer: Latest release version

Discusses and decisions

Data model, generator
Install all software in each node, run each real time stream processing software separately
Use Latex(Bibdesk) to write literature review and thesis

Following work

30-40 references, include benchmark(graph benchmark), performance analysis(paper or blog)
Make a plan with work milestones
Get familiar with Latex, Bibdesk
Setup experiment cluster and install softwares
Run sample application with Kafka + Real time stream system(Spark, Storm, Flink)

Open questions

What kind of data model would be used in the benchmark system? What kind of workload would be run in the benchmark?
Common functionality of real time stream system, like WordCount, Join, Window. Any more?
Meeting graph 1 Meeting graph 2