This project is designed for the instructor-led O'Reilly Training - In-Memory Computing Essentials. The code samples demonstrate essential capabilities of in-memory computing platforms such as Apache Ignite.
You can study the samples by following the instructor during the training or completing this guide on your own.
- Java Developer Kit, version 8 or later
- Apache Maven 3.0 or later
- Your favorite IDE, such as IntelliJ IDEA, or Eclipse, or a simple text editor.
Start a two-node Ignite cluster:
-
Open a terminal window and navigate to the root directory of this project.
-
Use Maven to create a core executable JAR with all the dependencies:
mvn clean package -P core
-
Start the first cluster node:
java -cp libs/core.jar training.ServerStartup
-
Open another terminal window and start the second node:
java -cp libs/core.jar training.ServerStartup
Both nodes auto-discover each other and you'll have a two-nodes cluster ready for exercises.
Now you need to create a Media Store schema and load the cluster with sample data. Use SQLLine tool to achieve that:
-
Launch a SQLLine process:
java -cp libs/core.jar sqlline.SqlLine
-
Connect to the cluster:
!connect jdbc:ignite:thin://127.0.0.1/ ignite ignite
-
Load the Media Store database:
!run config/media_store.sql
-
List all the tables of the database:
!tables
Keep the connection open as you'll use it for following exercises.
In this section you'll learn how to use key-value APIs for data processing and how to print partitions distribution across the cluster nodes:
-
Check the source code of
training.KeyValueApp
to see how key-value APIs are used to get Artists' records from the cluster. -
Build an executable JAR with the applications' classes:
mvn clean package -P apps
-
Run the application to see what result it produces:
java -cp libs/apps.jar training.KeyValueApp
-
Improve the application by implementing the logic that prints out the current partitions distribution (Refer to the TODO item for details).
Optional, scale out the cluster by the third node and run the application again. You'll see that some partitions were moved to the new node.
Ignite supports SQL for data processing including distributed joins, grouping and sorting. In this section, you're going to run basic SQL operations as well as more advanced ones.
Run the following query to find top-20 longest tracks:
SELECT trackid, name, MAX(milliseconds / (1000 * 60)) as duration FROM track
WHERE genreId < 17
GROUP BY trackid, name ORDER BY duration DESC LIMIT 20;
The next query is a modification of the first one but with details about musical genres of those top-20 longest tracks:
SELECT track.trackid, track.name, genre.name, MAX(milliseconds / (1000 * 60)) as duration FROM track
JOIN genre ON track.genreId = genre.genreId
WHERE track.genreId < 17
GROUP BY track.trackid, track.name, genre.name ORDER BY duration DESC LIMIT 20;
Since Genres
table is fully replicated across all the cluster nodes, the join between the two tables is safe and
Ignite always return a correct result set.
Modify the last query by adding information about an author who is in the top-20 longest. You do this by doing a LEFT
JOIN with Artist
table:
SELECT track.trackId, track.name as track_name, genre.name as genre, artist.name as artist,
MAX(milliseconds / (1000 * 60)) as duration FROM track
LEFT JOIN artist ON track.artistId = artist.artistId
JOIN genre ON track.genreId = genre.genreId
WHERE track.genreId < 17
GROUP BY track.trackId, track.name, genre.name, artist.name ORDER BY duration DESC LIMIT 20;
You can see that the artist
column is blank for some records. That's because Track
and Artist
tables are not co-located
and the nodes don't have all data available locally during the join phase.
Open an SQLLine connection with the non-colocated joins:
!connect jdbc:ignite:thin://127.0.0.1?distributedJoins=true ignite ignite
Re-execute the query to see all the columns filled in.
The non-colocated joins used above come with a performance penalty, i.e., if the nodes are shuffling large data sets
during the join phase, your performance will suffer. However, it's possible to co-locate Track
and Artist
tables, and
avoid using the non-colocated joins:
- Search for the
CREATE TABLE Track
command in themedia_store.sql
file - Replace
PRIMARY KEY (TrackId)
withPRIMARY KEY (TrackId, ArtistId)
- Co-located tracks with artist by adding
affinityKey=ArtistId
to the parameters list of theWITH ...
operator - Clean the Ignite work directory
${project}/ignite/work
- Restart the cluster nodes
- Reconnect with SQLLine
!connect jdbc:ignite:thin://127.0.0.1 ignite ignite
- Run that query once again and you'll see that all the
artist
columns are filled in because now all the tracks are stored together with their artists on the same cluster node.
Run training.ComputeApp
that uses Apache Ignite compute capabilities for a calculation of top-5 paying customers.
The compute task executes on every cluster node, iterates through local records and responds to the application that merges partial
results.
Run the app to see how it works:
java -cp libs/apps.jar training.ComputeApp
Modify the computation logic:
-
Update the logic to return top-10 paying customers.
-
Build an executable JAR with the applications' classes:
mvn clean package -P apps
-
Run the app again:
java -cp libs/apps.jar training.ComputeApp