-
Notifications
You must be signed in to change notification settings - Fork 2
Spark
Spark & Java
https://github.com/opencore/kafka-spark-avro-example
https://github.com/jleetutorial/sparkTutorial
spark-submit + YARN:
https://venkateshiyer.net/production-ready-spark-streaming-a8d85b7d66be
https://medium.com/@pmatpadi/spark-streaming-dynamic-scaling-and-backpressure-in-action-6ebdbc782a69
Spark best practices and tuning - https://legacy.gitbook.com/book/umbertogriffo/apache-spark-best-practices-and-tuning/details
spark.streaming.concurrentJobs
- http://why-not-learn-something.blogspot.com/2016/06/spark-streaming-performance-tuning-on.html
- https://stackoverflow.com/questions/23528006/how-jobs-are-assigned-to-executors-in-spark-streaming
Spark structured streaming, Kafka and Avro:
- Avro SerDe for Apache Spark structured APIs., https://github.com/AbsaOSS/ABRiS
Spark Streaming:
- https://www.slideshare.net/JoanViladrosaRiera/spark-streaming-kafka-010
- https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/
- http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/
- https://www.stratio.com/blog/optimizing-spark-streaming-applications-apache-kafka/
spark streaming logging configuration:
spark-shell:
$ spark-shell
scala> :load src/WordCount.scala
Maven Archetype (for Spark):
- https://github.com/spark-in-action/scala-archetype-sparkinaction
- https://www.luckyryan.com/2013/02/15/create-maven-archetype-from-existing-project/
- http://www.avajava.com/tutorials/lessons/how-do-i-create-an-archetype-that-can-run-on-an-existing-project.html
Unit test
- http://www.jesse-anderson.com/2016/04/unit-testing-spark-with-java/
- https://github.com/bosea/spark-unit-testing
- https://dzone.com/articles/testing-spark-code
- https://stackoverflow.com/questions/44536150/how-do-i-use-spark-testing-base-with-maven
Spark 2.2 CBO, https://databricks.com/blog/2017/08/31/cost-based-optimizer-in-apache-spark-2-2.html
STS:
https://github.com/yaooqinn/kyuubi
https://www.sigmoid.com/spark-streaming-internals/
Structured Streaming:
- https://databricks.com/blog/2017/01/19/real-time-streaming-etl-structured-streaming-apache-spark-2-1.html
- https://docs.databricks.com/spark/latest/structured-streaming/production.html
https://sparkour.urizone.net/recipes
https://spark-summit.org/east-2017/schedule/
https://developer.ibm.com/hadoop/2016/07/18/troubleshooting-and-tuning-spark-for-heavy-workloads/