Skip to content

Getting Started with Developing HBASE Toolkit

markheger edited this page Sep 14, 2020 · 2 revisions

The HBase toolkit provides support for interacting with Apache HBase from IBM Streams.

HBase is a Hadoop database, a distributed, scalable, big data store. Tables are partitioned by rows across clusters. A value in an HBase table is accessed by its row, columnFamily, columnQualifier, and timestamp. Usually the timestamp is left out, and only the latest value is returned. The HBase toolkit currently provides no support related to timestamps.

The columnFamily and columnQualifier can collectively be thought of as a column and are sometimes called that in the APIs. The separation of the column into two parts allows for some extra flexibility: the columnFamilies must be defined when the table is established and might be limited, but new columnQualifiers can be added at run time and there is no limit to their number.

Tuples can be added to a HBase table by using the HBASEPut operator (which includes a checkAndPut condition) or incremented with the HBASEIncrement operator.

Tuples can be retrieved with the HBASEGet operator from an HBase table.

The HBASEScan operator can output all tuples, or all tuples in a particular row range from an HBase table.

The HBASEDelete operator enables tuples to be deleted from an HBase table.

For some operators, such as HBASEPut, the row, columnFamily, columnQualifer, and value must all be specified. For other operators, such as HBASEGet and HBASEDelete, the behavior depends on which of those items are specified. The HBASEDelete operator, for example, deletes the whole row if columnFamily and columnQualifier are not specified, but it can also be used to delete only a single value.

The columnFamily and columnQualifier (when relevant) can either be specified as an attribute of the input tuple (columnFamilyAttrName, columnQualifierAttrName), or specified as a single string that is used for all tuples (staticColumnFamily, staticColumnQualifier). The the row and the value (when needed) come from the input tuple. HBase supports locking by using a check-and-update mechanism for delete and put. This only locks within a single row, but it allows you to specify either:

a full entry (row, columnFamily, columnQualifier, value). If this entry exists with the given value, HBase makes the pure or delete.

a partial entry (row, columnFamily, columnQualifier). If there is no value, HBase makes the update.

Note that the row of the put or delete and the row of the check must be the same. These are scenarios are supported by the HBASEPut and HBASEDelete operators by specifying a checkAttrName as a parameter. This attribute on the input stream must be of type tuple and have an attribute of columnFamily and columnQualifier (with a value if you are doing the first type of check). In this mode, the operator can have an output port with a success attribute to indicate whether the put or delete happened.

Except for HBASEIncrement and HBASEGet, the only data types that are currently supported are rstrings. HBASEGet supports getting a value of type long.

The com.ibm.streamsx.hbase uses the same configuration information from the hbase-site.xml file that HBase does. For more information about HBase, see http://hbase.apache.org/.

Please check for more details the samples in:

https://github.com/IBMStreams/streamsx.hbase/tree/develop/samples