Spark

Jump to bottom Edit New page

Youngwoo Kim edited this page Apr 14, 2021 · 37 revisions

Spark & Java

https://dzone.com/articles/the-magic-of-apache-spark-in-java-1

https://github.com/opencore/kafka-spark-avro-example

https://github.com/jleetutorial/sparkTutorial

spark-submit + YARN:

https://gist.github.com/bernhardschaefer/4309f728f66879c0a8c062be0801057b

https://venkateshiyer.net/production-ready-spark-streaming-a8d85b7d66be

https://medium.com/@pmatpadi/spark-streaming-dynamic-scaling-and-backpressure-in-action-6ebdbc782a69

Spark best practices and tuning - https://legacy.gitbook.com/book/umbertogriffo/apache-spark-best-practices-and-tuning/details

spark.streaming.concurrentJobs

Spark structured streaming, Kafka and Avro:

Avro SerDe for Apache Spark structured APIs., https://github.com/AbsaOSS/ABRiS

Spark Streaming:

spark streaming logging configuration:

http://shzhangji.com/blog/2015/05/31/spark-streaming-logging-configuration/

spark-shell:

$ spark-shell
scala> :load src/WordCount.scala

Maven Archetype (for Spark):

Unit test

Spark SQL

Spark 2.2 CBO, https://databricks.com/blog/2017/08/31/cost-based-optimizer-in-apache-spark-2-2.html

STS:

https://developer.ibm.com/hadoop/2016/08/22/how-to-run-queries-on-spark-sql-using-jdbc-via-thrift-server/

https://github.com/yaooqinn/kyuubi

Spark Streaming

https://www.sigmoid.com/spark-streaming-internals/

Structured Streaming:

http://events.linuxfoundation.org/sites/events/files/slides/ApacheCon%20-%20Starting%20with%20Apache%20Spark%2C%20Best%20Practices.pdf

Examples

https://sparkour.urizone.net/recipes

Refs

https://spark-summit.org/east-2017/schedule/

https://developer.ibm.com/hadoop/2016/07/18/troubleshooting-and-tuning-spark-for-heavy-workloads/

https://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&source=web&cd=9&cad=rja&uact=8&ved=0ahUKEwiT8MCix9zSAhUDrJQKHfctBo4QFghgMAg&url=https%3A%2F%2Fwww.slideshare.net%2Fjcmia1%2Fapache-spark-20-tuning-guide&usg=AFQjCNG5vZtS27oqKzKbwaXae1JDa6MogA

https://github.com/pavloff-de/spark4knime