Skip to content
Youngwoo Kim edited this page Apr 14, 2021 · 37 revisions

Spark & Java


https://github.com/opencore/kafka-spark-avro-example

https://github.com/jleetutorial/sparkTutorial

spark-submit + YARN:

https://venkateshiyer.net/production-ready-spark-streaming-a8d85b7d66be

https://medium.com/@pmatpadi/spark-streaming-dynamic-scaling-and-backpressure-in-action-6ebdbc782a69

Spark best practices and tuning - https://legacy.gitbook.com/book/umbertogriffo/apache-spark-best-practices-and-tuning/details

spark.streaming.concurrentJobs

Spark structured streaming, Kafka and Avro:

Spark Streaming:

spark streaming logging configuration:

spark-shell:

$ spark-shell
scala> :load src/WordCount.scala

Maven Archetype (for Spark):

Unit test

Spark SQL

Spark 2.2 CBO, https://databricks.com/blog/2017/08/31/cost-based-optimizer-in-apache-spark-2-2.html

STS:

https://github.com/yaooqinn/kyuubi

Spark Streaming

https://www.sigmoid.com/spark-streaming-internals/

Structured Streaming:

http://events.linuxfoundation.org/sites/events/files/slides/ApacheCon%20-%20Starting%20with%20Apache%20Spark%2C%20Best%20Practices.pdf

Examples

https://sparkour.urizone.net/recipes

Refs

https://spark-summit.org/east-2017/schedule/

https://developer.ibm.com/hadoop/2016/07/18/troubleshooting-and-tuning-spark-for-heavy-workloads/

https://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&source=web&cd=9&cad=rja&uact=8&ved=0ahUKEwiT8MCix9zSAhUDrJQKHfctBo4QFghgMAg&url=https%3A%2F%2Fwww.slideshare.net%2Fjcmia1%2Fapache-spark-20-tuning-guide&usg=AFQjCNG5vZtS27oqKzKbwaXae1JDa6MogA

https://github.com/pavloff-de/spark4knime

Clone this wiki locally