Here's a short program that connects to Cassandra and executes a query:
Cluster cluster = null;
try {
cluster = Cluster.builder() // (1)
.addContactPoint("127.0.0.1")
.build();
Session session = cluster.connect(); // (2)
ResultSet rs = session.execute("select release_version from system.local"); // (3)
Row row = rs.one();
System.out.println(row.getString("release_version")); // (4)
} finally {
if (cluster != null) cluster.close(); // (5)
}
- the Cluster object is the main entry point of the driver. It holds the known state of the actual Cassandra cluster (notably the Metadata). This class is thread-safe, you should create a single instance (per target Cassandra cluster), and share it throughout your application;
- the Session is what you use to execute queries. Likewise, it is thread-safe and should be reused;
- we use
execute
to send a query to Cassandra. This returns a ResultSet, which is essentially a collection of Row objects. On the next line, we extract the first row (which is the only one in this case); - we extract the value of the first (and only) column from the row;
- finally, we close the cluster after we're done with it. This will also close any session that was created from this cluster. This step is important because it frees underlying resources (TCP connections, thread pools...). In a real application, you would typically do this at shutdown (for example, when undeploying your webapp).
Note: this example uses the synchronous API. Most methods have asynchronous equivalents.
The simplest approach is to do it programmatically with Cluster.Builder, which provides a fluent API:
Cluster cluster = Cluster.builder()
.withClusterName("myCluster")
.addContactPoint("127.0.0.1")
.build();
Alternatively, you might want to retrieve the settings from an external source (like a properties file or a web service). You'll need to provide an implementation of Initializer that loads these settings:
Initializer myInitializer = ... // your implementation
Cluster cluster = Cluster.buildFrom(myInitializer);
The only required option is the list of contact points, i.e. the hosts that the driver will initially contact to discover the cluster topology. You can provide a single contact point, but it is usually a good idea to provide more, so that the driver can fallback if the first one is down.
The other aspects that you can configure on the Cluster
are:
- address translation;
- authentication;
- compression;
- load balancing;
- metrics;
- low-level Netty configuration;
- query options;
- reconnections;
- retries;
- socket options;
- SSL;
- speculative executions;
- query timestamps.
In addition, you can register various types of listeners to be notified of cluster events; see Host.StateListener, LatencyTracker, and SchemaChangeListener.
A freshly-built Cluster
instance does not initialize automatically; that will be triggered by one of the following
actions:
- an explicit call to
cluster.init()
; - a call to
cluster.getMetadata()
; - creating a session with
cluster.connect()
or one of its variants; - calling
session.init()
on a session that was created withcluster.newSession()
.
The initialization sequence is the following:
- initialize internal state (thread pools, utility components, etc.);
- try to connect to each of the contact points in sequence. The order is not deterministic (in fact, the driver shuffles the list to avoid hotspots if a large number of clients share the same contact points). If no contact point replies, a NoHostAvailableException is thrown and the process stops here;
- otherwise, the successful contact point is elected as the control host. The driver negotiates the native protocol version with it, and queries its system tables to discover the addresses of the other hosts.
Note that, at this stage, only the control connection has been established. Connections to other hosts will only be opened when a session gets created.
By default, a session isn't tied to any specific keyspace. You'll need to prefix table names in your queries:
Session session = cluster.connect();
session.execute("select * from myKeyspace.myTable where id = 1");
You can also specify a keyspace name at construction time, it will be used as the default when table names are not qualified:
Session session = cluster.connect("myKeyspace");
session.execute("select * from myTable where id = 1");
session.execute("select * from otherKeyspace.otherTable where id = 1");
You might be tempted to open a separate session for each keyspace used in your application; however, note that connection pools are created at the session level, so each new session will consume additional system resources:
// Warning: creating two sessions doubles the number of TCP connections opened by the driver
Session session1 = cluster.connect("ks1");
Session session2 = cluster.connect("ks2");
Also, there is currently a known limitation with named sessions, that causes the driver to unexpectedly block the calling thread in certain circumstances; if you use a fully asynchronous model, you should use a session with no keyspace.
Finally, if you issue a USE
statement, it will change the default keyspace on that session:
Session session = cluster.connect();
// No default keyspace set, need to prefix:
session.execute("select * from myKeyspace.myTable where id = 1");
session.execute("USE myKeyspace");
// Now the keyspace is set, unqualified query works:
session.execute("select * from myTable where id = 1");
Be very careful though: if the session is shared by multiple threads, switching the keyspace at runtime could easily cause unexpected query failures.
Generally, the recommended approach is to use a single session with no keyspace, and prefix all your queries.
You run queries with the session's execute
method:
ResultSet rs = session.execute("select release_version from system.local");
As shown here, the simplest form is to pass a query string directly. You can also pass an instance of Statement.
Executing a query produces a ResultSet, which is an iterable of Row. The basic way to process all rows is to use Java's for-each loop:
for (Row row : rs) {
// process the row
}
Note that this will return all results without limit (even though the driver might use multiple queries in the
background). To handle large result sets, you might want to use a LIMIT
clause in your CQL query, or use one of the
techniques described in the paging documentation.
When you know that there is only one row (or are only interested in the first one), the driver provides a convenience method:
Row row = rs.one();
Row provides getters to extract column values; they can be either positional or named:
Row row = session.execute("select first_name, last_name from users where id = 1").one();
// The two are equivalent:
String firstName = row.getString(0);
String firstName = row.getString("first_name");
CQL3 data type | Getter name | Java type |
ascii | getString | java.lang.String |
bigint | getLong | long |
blob | getBytes | java.nio.ByteBuffer |
boolean | getBool | boolean |
counter | getLong | long |
date | getDate | LocalDate |
decimal | getDecimal | java.math.BigDecimal |
double | getDouble | double |
float | getFloat | float |
inet | getInet | java.net.InetAddress |
int | getInt | int |
list | getList | java.util.List |
map | getMap | java.util.Map |
set | getSet | java.util.Set |
smallint | getShort | short |
text | getString | java.lang.String |
time | getTime | long |
timestamp | getTimestamp | java.util.Date |
timeuuid | getUUID | java.util.UUID |
tinyint | getByte | byte |
tuple | getTupleValue | TupleValue |
user-defined types | getUDTValue | UDTValue |
uuid | getUUID | java.util.UUID |
varchar | getString | java.lang.String |
varint | getVarint | java.math.BigInteger |
In addition to these default mappings, you can register your own types with custom codecs.
For performance reasons, the driver uses primitive Java types wherever possible (boolean
, int
...); the CQL value
NULL
is encoded as the type's default value (false
, 0
...), which can be ambiguous. To distinguish NULL
from
actual values, use isNull
:
Integer age = row.isNull("age") ? null : row.getInt("age");
To ensure type safety, collection getters are generic. You need to provide type parameters matching your CQL type when calling the methods:
// Assuming given_names is a list<text>:
List<String> givenNames = row.getList("given_names", String.class);
For nested collections, element types are generic and cannot be expressed as Java Class
instances. We use Guava's
TypeToken instead:
// Assuming teams is a set<list<text>>:
TypeToken<List<String>> listOfStrings = new TypeToken<List<String>>() {};
Set<List<String>> teams = row.getSet("teams", listOfStrings);
Since type tokens are anonymous inner classes, it's recommended to store them as constants in a utility class instead of re-creating them each time.
Row
exposes an API to explore the column metadata at runtime:
for (ColumnDefinitions.Definition definition : row.getColumnDefinitions()) {
System.out.printf("Column %s has type %s%n",
definition.getName(),
definition.getType());
}
Besides explicit work with queries and rows, you can also use Object Mapper to simplify retrieval & store of your data.
If you're reading this from the generated HTML documentation on github.io, use the "Contents" menu on the left hand side to navigate sub-sections. If you're browsing the source files on github.com, simply navigate to each sub-directory.