Custom codecs support transparent, user-configurable mapping of CQL types to arbitrary Java objects.
Practical use cases that justify such a feature are numerous:
- Ability to map CQL timestamp, date and time columns to Java 8 or Joda Time classes (rather than the built-in mappings for Date, LocalDate and Long);
- Ability to map CQL varchar columns directly to JSON or XML mapped objects (i.e. using frameworks such as Jackson);
- Ability to map CQL user-defined types to Java objects;
- Ability to map CQL lists directly to Java arrays;
- Ability to map CQL collections directly to Scala collections;
- etc.
This page explains the implementation and how to write your own custom codecs. Note that the driver also provides a set of optional codecs that might fit your needs.
The central piece in the serialization mechanism is TypeCodec.
Each TypeCodec
supports a bidirectional mapping between a Java type and a CQL type. A TypeCodec
is thus capable of 4 basic operations:
- Serialize a Java object into a CQL value;
- Deserialize a CQL value into a Java object;
- Format a Java object into a CQL literal;
- Parse a CQL literal into a Java object.
Additionally, it implements inspection methods that verify if the codec is suitable for a specific CQL type and/or Java object.
TypeCodec
can be freely subclassed by users willing to implement their own serialization logic.
However, users are only required to do so when they want to extend
the driver's default set of codecs by registering additional codecs that handle
Java types that the driver is otherwise unable to process. For all other users,
the built-in set of codecs that is bundled with the driver provides
exactly the same functionality that the driver used to offer so far,
with out-of-the-box conversion from all CQL types to standard Java equivalents.
Refer to the javadocs of TypeCodec
for more information.
TypeCodec
s are centralized in a CodecRegistry instance.
Whenever the driver needs to perform one of the aforementioned operations,
it will lookup for an appropriate codec in a CodecRegistry
.
CodecRegistry
is also used to manually register user-created codecs (see below for detailed instructions).
It also caches codec lookup results for faster execution whenever possible.
Refer to the javadocs of CodecRegistry
for more information.
Let's consider the following scenario: a user has JSON documents stored in a varchar column, and wants the driver to automatically map that column to a Java object using the Jackson library, instead of returning the raw JSON string.
The (admittedly simplistic) table structure is as follows:
CREATE TABLE t (id int PRIMARY KEY, json VARCHAR);
The first step is to implement a suitable codec. Using Jackson, here is what a Json codec could look like:
/**
* A simple Json codec.
*/
public class JsonCodec<T> extends TypeCodec<T> {
private final ObjectMapper objectMapper = new ObjectMapper();
public JsonCodec(Class<T> javaType) {
super(DataType.varchar(), javaType);
}
@Override
public ByteBuffer serialize(T value, ProtocolVersion protocolVersion) throws InvalidTypeException {
if (value == null)
return null;
try {
return ByteBuffer.wrap(objectMapper.writeValueAsBytes(value));
} catch (JsonProcessingException e) {
throw new InvalidTypeException(e.getMessage(), e);
}
}
@Override
@SuppressWarnings("unchecked")
public T deserialize(ByteBuffer bytes, ProtocolVersion protocolVersion) throws InvalidTypeException {
if (bytes == null)
return null;
try {
byte[] b = new byte[bytes.remaining()];
// always duplicate the ByteBuffer instance before consuming it!
bytes.duplicate().get(b);
return (T) objectMapper.readValue(b, toJacksonJavaType());
} catch (IOException e) {
throw new InvalidTypeException(e.getMessage(), e);
}
}
@Override
public String format(T value) throws InvalidTypeException {
if (value == null)
return "NULL";
String json;
try {
json = objectMapper.writeValueAsString(value);
} catch (IOException e) {
throw new InvalidTypeException(e.getMessage(), e);
}
return '\'' + json.replace("\'", "''") + '\'';
}
@Override
@SuppressWarnings("unchecked")
public T parse(String value) throws InvalidTypeException {
if (value == null || value.isEmpty() || value.equalsIgnoreCase("NULL"))
return null;
if (value.charAt(0) != '\'' || value.charAt(value.length() - 1) != '\'')
throw new InvalidTypeException("JSON strings must be enclosed by single quotes");
String json = value.substring(1, value.length() - 1).replace("''", "'");
try {
return (T) objectMapper.readValue(json, toJacksonJavaType());
} catch (IOException e) {
throw new InvalidTypeException(e.getMessage(), e);
}
}
protected JavaType toJacksonJavaType() {
return TypeFactory.defaultInstance().constructType(getJavaType().getType());
}
}
A few implementation guidelines:
- Your codecs should be thread-safe, or better yet, immutable;
- Your codecs should be fast: do not forget that codecs are executed often and are usually very "hot" pieces of code;
- Your codecs should never block.
The second step is to register your codec with a CodecRegistry
instance:
JsonCodec<MyPojo> myJsonCodec = new JsonCodec<MyPojo>(MyPojo.class);
CodecRegistry myCodecRegistry = cluster.getConfiguration().getCodecRegistry();
myCodecRegistry.register(myJsonCodec);
As you can see, the easiest way to do so is to access the Cluster
's CodecRegistry
.
By default, Cluster
instances will use CodecRegistry.DEFAULT_INSTANCE
,
which should be adequate for most users.
It is however possible to create a Cluster
using a different CodecRegistry
:
CodecRegistry myCodecRegistry = new CodecRegistry();
Cluster cluster = new Cluster.builder().withCodecRegistry(myCodecRegistry).build();
Note: when you instantiate a new CodecRegistry
, it automatically registers all the default codecs used by the driver.
This ensures that the new registry will not lack of an essential codec. You cannot deregister nor override
an already-registered codec, only register new ones. If you try to register a codec that would override
an existing one, the driver will log a warning and ignore it.
From now on, your custom codec is fully operational. It will be used every time the driver encounters
a MyPojo
instance when executing queries, or when you ask it to retrieve a MyPojo
instance from a ResultSet
.
For example, here is how to save a MyPojo
object:
Cluster cluster = ...
Session session = ...
MyPojo myPojo = ...
// Using SimpleStatement
Statement stmt = new SimpleStatement("INSERT INTO t (id, json) VALUES (?, ?)", 42, myPojo));
// Using the Query Builder
BuiltStatement insertStmt = QueryBuilder.insertInto("t")
.value("id", 42)
.value("json", myPojo);
// Using BoundStatements
PreparedStatement ps = session.prepare("INSERT INTO t (id, json) VALUES (?, ?)");
BoundStatement bs1 = ps.bind(42, myPojo); // or alternatively...
BoundStatement bs2 = ps.bind()
.setInt(0, 42)
.set(1, myPojo, MyPojo.class);
And here is how to retrieve a MyPojo
object converted from a JSON document:
ResultSet rs = session.execute(...);
Row row = rs.one();
// Let the driver convert the string for you...
MyPojo myPojo = row.get(1, MyPojo.class);
// ... or retrieve the raw string if you need it
String json = row.get(1, String.class); // row.getString(1) would have worked too
Tip: Row
and BoundStatement
have several methods to set and retrieve values.
But if you plan to use custom codecs,
make sure that you use preferably the get()
and set()
methods: they
avoid any ambiguity by requiring the user to explicitly specify the desired Java type,
thus forcing the driver to pick the right codec for the right task.
Custom codecs also work with the driver's object mapper.
CodecRegistry
instances not only store default codecs and user-defined codecs;
they can also create new codecs on the fly, based on the set of codecs they currently hold.
They can manage the following mappings:
- Collections (list, sets and maps) of known types. For example, if you registered a
TypeCodec<A>
, you getList<A>>
handled for free. This works recursively for nested collections; - UserType instances are automatically mapped to UDTValue objects. All registered codecs are available recursively to the UDT's fields;
- TupleType instances are automatically mapped to TupleValue objects (with the same rules for nested fields);
- CustomType instances are automatically mapped to ByteBuffer instances.
This way, the user does not have to manually register all derived codecs for a given "base" codec. However, other combinations of Java and CQL types not listed above cannot have their codecs created on the fly; such codecs must be manually registered.
If the codec registry encounters a mapping that it can't handle automatically, a CodecNotFoundException is thrown.
Thanks to on-the-fly codec generation, it is possible to leverage the JsonCodec
from our previous
example to retrieve lists of MyPojo
objects stored in a CQL column of type list<text>
, without
the need to explicitly declare a codec for this specific CQL type:
JsonCodec<MyPojo> myJsonCodec = new JsonCodec<MyPojo>(MyPojo.class);
myCodecRegistry.register(myJsonCodec);
Row row = ...
List<MyPojo> myPojos = row.getList("jsonList", MyPojo.class); // works out of the box!
By default, the driver maps user-defined type values to UDTValue instances.
If you want to register a custom codec for a specific user-defined type, follow these steps:
Let's suppose you have a keyspace containing the following type and table:
CREATE TYPE address (street VARCHAR, zipcode INT);
CREATE TABLE users (id UUID PRIMARY KEY, name VARCHAR, address frozen<address>);
And let's suppose that you want to map the type address
to an Address
class:
public class Address {
private final String street;
private final int zipcode;
public Address(String street, int zipcode) {
this.street = street;
this.zipcode = zipcode;
}
// getters, setters, equals() and hashcode() omitted
}
First of all, create a codec that knows how to serialize this class; one possible solution
(but not necessarily the most efficient one) would be to convert Address
instances into UDTValue
instances, then serialize them with another codec:
public class AddressCodec extends TypeCodec<Address> {
private final TypeCodec<UDTValue> innerCodec;
private final UserType userType;
public AddressCodec(TypeCodec<UDTValue> innerCodec, Class<Address> javaType) {
super(innerCodec.getCqlType(), javaType);
this.innerCodec = innerCodec;
this.userType = (UserType)innerCodec.getCqlType();
}
@Override
public ByteBuffer serialize(Address value, ProtocolVersion protocolVersion) throws InvalidTypeException {
return innerCodec.serialize(toUDTValue(value), protocolVersion);
}
@Override
public Address deserialize(ByteBuffer bytes, ProtocolVersion protocolVersion) throws InvalidTypeException {
return toAddress(innerCodec.deserialize(bytes, protocolVersion));
}
@Override
public Address parse(String value) throws InvalidTypeException {
return value == null || value.isEmpty() || value.equalsIgnoreCase("NULL") ?
null : toAddress(innerCodec.parse(value));
}
@Override
public String format(Address value) throws InvalidTypeException {
return value == null ? "NULL" : innerCodec.format(toUDTValue(value));
}
protected Address toAddress(UDTValue value) {
return value == null ? null : new Address(
value.getString("street"),
value.getInt("zipcode")
);
}
protected UDTValue toUDTValue(Address value) {
return value == null ? null : userType.newValue()
.setString("street", value.getStreet())
.setInt("zipcode", value.getZipcode());
}
}
Now (and only if you intend to use your own CodecRegistry
),
create it and register it on your Cluster
instance:
// only if you do not intend to use CodecRegistry.DEFAULT_INSTANCE
CodecRegistry codecRegistry = new CodecRegistry();
Cluster cluster = Cluster.builder()
.addContactPoints(...)
.withCodecRegistry(codecRegistry)
.build();
Now, retrieve the metadata for type address
:
UserType addressType = cluster.getMetadata().getKeyspace(...).getUserType("address");
TypeCodec<UDTValue> addressTypeCodec = codecRegistry.codecFor(addressType);
And finally, register your AddressCodec
with the CodecRegistry
. Note that
your codec simply delegates the hard work to an inner codec that knows how to deal with
user type values:
AddressCodec addressCodec = new AddressCodec(addressTypeCodec, Address.class);
codecRegistry.register(addressCodec);
And voilà! You can now retrieve address
values as Address
instances:
ResultSet rows = session.execute(...);
Row row = rows.one();
// let the driver convert it for you...
Address address = row.get("address", Address.class);
// ...or access the low-level UDTValue, if you need
UDTValue addressUdt = row.get("address", UDTValue.class); // row.getUDTValue("address") would have worked too
Note that this solution creates an intermediary UDTValue
object when
serializing and deserializing. It's possible to bypass this step with a
lower-level implementation that manipulates the binary stream directly.
That's also how the object mapper handles UDTs, and you can rely on the
mapper to generate UDT codecs for you; see
this page for more
information.
Java generic (parameterized) types are fully supported, through Guava's TypeToken API.
Be sure to only use get()
and set()
methods that take a TypeToken
argument:
// this works
Foo<Bar> foo = row.get(0, new TypeToken<Foo<Bar>>(){})
// this does not work due to type erasure
Foo<Bar> foo = row.get(0, Foo.class);
Suppose the following class hierarchy:
class Animal {}
class Cat extends Animal {}
By default, a codec will accept to serialize any object that extends or
implements its declared Java type: a codec such as
AnimalCodec extends TypeCodec<Animal>
will accept Cat
instances as well.
This allows a codec to handle interfaces and superclasses
in a generic way, regardless of the actual implementation being
used by client code; for example, the driver has a built-in codec
that handles List
instances, and this codec is capable of
serializing any concrete List
implementation.
But this has one caveat: when setting or retrieving values
with get()
and set()
, it is vital to pass the exact
Java type the codec handles:
codecRegistry.register(new AnimalCodec());
BoundStatement bs = ...
bs.set(0, new Cat(), Animal.class); // works
bs.set(0, new Cat(), Cat.class); // throws CodecNotFoundException
The same is valid when retrieving values:
codecRegistry.register(new AnimalCodec());
Row row = ...
Animal animal = row.get(0, Animal.class); // works
Cat cat = row.get(0, Cat.class); // throws CodecNotFoundException
A codec lookup operation may be costly; to mitigate this, the CodecRegistry
caches lookup results whenever possible.
When the cache can be used and the registry looks up a codec, the following rules apply:
- if a result was previously cached for that mapping, it is returned;
- otherwise, the registry checks the list of "basic" codecs: the default ones, and the ones that were explicitly registered (in the order that they were registered). It calls each codec's accepts methods to determine if it can handle the mapping, and if so, adds it to the cache and returns it;
- otherwise, the registry tries to generate a codec on the fly, according to the rules outlined above;
- if the creation succeeds, the registry adds the created codec to the cache and returns it;
- otherwise, the registry throws CodecNotFoundException.
The cache can be used as long as at least the CQL type is known. The following situations are exceptions where it is not possible to cache lookup results:
- With SimpleStatement instances;
- With BuiltStatement instances (created via the Query Builder).
In these places, the driver has no way to determine the right CQL type to use, so it performs a best-effort heuristic to guess which codec to use. The result of this heuristic is never cached.
Beware that in these cases, the lookup performs in average 10x worse. If performance is a key factor for your application, consider using prepared statements all the time.