An experimental VoltDB to Hive export conduit that takes advantage of Hive's new streaming hcatalog API. It allows export table writers to push data directly into correspoding Hive tables.
- Install Gradle
On a Mac if you have Homebrew setup then simply install the gradle bottle
brew install gradle
On Linux setup GVM, and install gradle as follows
gvm install gradle
- Create
gradle.properties
file and set thevoltdbhome
property to the base directory where your VoltDB is installed
echo voltdbhome=/voltdb/home/dirname > gradle.properties
- Invoke gradle to compile artifacts
gradle shadowJar
- To setup an eclipse project run gradle as follows
gradle cleanEclipse eclipse
then import it into your eclipse workspace by using File->Import projects menu option
-
Copy the built jar from
build/libs
tolib/extension
under your VoltDB installation directory -
Edit your deployment file and use the following export XML stanza as a template
<?xml version="1.0"?>
<deployment>
<cluster hostcount="1" sitesperhost="4" kfactor="0" />
<httpd enabled="true">
<jsonapi enabled="true" />
</httpd>
<export>
<configuration stream="hive" enabled="true" type="custom"
exportconnectorclass="org.voltdb.exportclient.hive.HiveExportClient">
<property name="hive.uri">thrift://hive-host:9083</property>
<property name="hive.db">meco</property>
<property name="hive.table">alerts</property>
<property name="hive.partition.columns">ALERTS:CONTINENT|COUNTRY</property>
</configuration>
</export>
</deployment>
This tells VoltDB to write to the alerts table on Hive, via the homonymous export table in VoltDB, using columns CONTINENT and COUNTRY as value providers for Hive partitions discerners. For example the alerts table is defined in Hive as:
create table alerts ( id int , msg string )
partitioned by (continent string, country string)
clustered by (id) into 5 buckets
stored as orc; // currently ORC is required for streaming
while the VoltDB export table is defined as:
FILE -inlinebatch END_OF_EXPORT
create stream alerts partitioned on column id export to target hive (
id integer not null,
msg varchar(128),
continent varchar(64),
country varchar(64)
)
;
END_OF_EXPORT
When a row is inserted into the export table
INSERT INTO ALERTS (ID,MSG,CONTINENT,COUNTRY) VALUES (1,'fab-02 inoperable','EU','IT');
The continent ('EU') and country ('IT') column values are used to specify the Hive table partition.
hive.uri
(mandatory) thrift URI to the Hive hosthive.db
(mandatory) Hive databasehive.table
(mandatory) Hive tablehive.partition.columns
(mandatory if the hive table is partitioned) format: table-1:column-1|column-2|...|column-n,table-2:column-1|column-2|...|column-n,...,table-n:column-1|column-2|...|column-ntimezone
(optional, default: local timezone) timezone used to format timestamp values
Partition columns must be of type VARCHAR. Any empty or null partition column values are converted to __VoltDB_unspecified__