Simple Spark & Ignite Integration

Two simple Spark applications that demonstrate integration with Apache Ignite.

Prerequisites

1. Install Apache Ignite - https://apacheignite.readme.io/docs/getting-started#installation 2. Install Apache Spark - http://spark.apache.org/docs/latest/ 3. Grab a data file - For example, download biographies.list.gz from ftp://ftp.fu-berlin.de/pub/misc/movies/database/ and extract 4. Start Ignite - `YOUR_IGNITE_LOCATION/bin/ignite.sh` 5. Start Spark cluster - `YOUR_SPARK_LOCATION/sbin/start-all.sh` 6. Build the application: - `sbt package` 7. Run Spark+Ignite - `start.sh`

Scripted Runnable Examples

SparkWordCount

The first example counts the occurrence of each word in a corpus and then counts the occurrence of each character in the most popular words.

To run Spark+Ignite: ./start.sh com.gridgain.examples.sparkwordcount.SparkWordCount

To run Spark-RDD only for comparison: ./start-rdd.sh com.gridgain.examples.sparkwordcount.SparkWordCount

SparkSQLJoin

This example demonstrates joining 2 RDDs via the use of DataFrames. The first RDD is create from a file and the second is an IgniteRDD.

To run Spark+Ignite: ./start.sh com.gridgain.examples.sparkwordcount.SparkSQLJoin

To run Spark-RDD only for comparison: ./start-rdd.sh com.gridgain.examples.sparkwordcount.SparkSQLJoin

IgniteRDD

To run from a master node in a Spark cluster:

bin/spark-submit --class com.gridgain.examples.sparkwordcount.SparkWordCount --master local --jars <IGNITE_HOME>/libs/ignite-core-1.5.11.jar,<IGNITE_HOME>/libs/optional/ignite-spark_2.10/ignite-spark_2.10-1.5.11.jar,<IGNITE_HOME>/libs/cache-api-1.0.0.jar, spark-ignite-example-1.0.jar

This will run the application in a single local process. If the cluster is running a Spark standalone cluster manager, you can replace "--master local" with "--master spark://<master host>:<master port>".

If the cluster is running YARN, you can replace "--master local" with "--master yarn".

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
src/main/scala/com/gridgain/examples/sparkwordcount		src/main/scala/com/gridgain/examples/sparkwordcount
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SparkIgniteSimpleExample.ipr		SparkIgniteSimpleExample.ipr
log4j.properties		log4j.properties
pom.xml		pom.xml
simple.sbt		simple.sbt
start-rdd.sh		start-rdd.sh
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Spark & Ignite Integration

Prerequisites

Scripted Runnable Examples

SparkWordCount

SparkSQLJoin

IgniteRDD

About

Releases

Packages

Languages

License

magriggs/SparkIgniteSimpleExample

Folders and files

Latest commit

History

Repository files navigation

Simple Spark & Ignite Integration

Prerequisites

Scripted Runnable Examples

SparkWordCount

SparkSQLJoin

IgniteRDD

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages