Skip to content
This repository has been archived by the owner on Jul 15, 2019. It is now read-only.

Getting Started

dgomezferro edited this page Sep 14, 2012 · 17 revisions

Here we explain all the steps needed to get a transactional HBase working assuming you already have a (pseudo-)distributed HBase cluster.

Installation

To install Omid, just download an unpack the repository. Under the main directory, run this

$ mvn install

Now you can add a dependency on Omid-0.0.1-SNAPSHOT on your project if you are using maven, otherwise just point to the jar created in target from your application's classpath.

Running the Status Oracle

You must run the Status Oracle in one server of your cluster. You can either use the script provided in the repository under bin/omid.sh or run directly the class TSOServer:

$ java -cp ./target/Omid-0.0.1-SNAPSHOT.jar com.yahoo.omid.tso.TSOServer -port <port>

Configuration

Omid's clients use the usual hbase-site.xml for configuration, it needs two additional parameters

<configuration>
 <property>
  ...
  Hbase configuration properties
  ....
 </property>
 <property>
  <name>tso.host</name>
  <value>TSO_HOST</value>
 </property>
 <property>
   <name>tso.port</name>
   <value>TSO_PORT</value>
 </property>
</configuration>

Usage

To use HBase transactional support the relevant interfaces are TransactionManager and TransactionalTable, both in com.yahoo.omid.client. The common use case is to start a new transaction with TransactionState txn1 = transactionManager.beginTransaction(), then use this transaction to perform different HBase operations, like transactionalTable.put(txn1, putOperation) and finally commit it transactionManager.tryCommit(txn1).

Example

     Configuration conf = HBaseConfiguration.create();
     TransactionManager tm = new TransactionManager(conf);
     TransactionalTable tt = new TransactionalTable(conf, TEST_TABLE);
     
     TransactionState t1 = tm.beginTransaction();

     Put put = new Put(row);
     putt.add(fam, col, data);
     tt.put(t1, p);
     
     ResultScanner rs = tt.getScanner(t1, new Scan().setStartRow(startrow).setStopRow(stoprow));
     Result r = rs.next();
     while (r != null) {
        ...
        r = rs.next();
     }
     tm.tryCommit(t1);

Garbage Collection

If you know the maximum number of concurrent transactions that could be modifying a data item at the same time, you could set the maximum number of versions HBase will store to that number (plus some threshold), because Omid needs to have access to that many versions to guarantee Snapshot Isolation.

If this number is not bounded or you don't want to take risks, we provide a custom compacter that uses HBase's Coprocessors. You should set the maximum number of versions to infinity (Long.MAX_VALUE) and our compacter takes care of deleting the versions not needed anymore.

Configuration

You must make sure the Omid jar is available to all region servers and add this to your hbase-site.xml (the one the HBase servers use):

<property>
  <name>hbase.coprocessor.region.classes</name>
  <value>com.yahoo.omid.client.regionserver.Compacter</value>
</property>

High Availability

The Status Oracle writes to a Write Ahead Log (WAL) in order to be able to recover from a failure. Currently we only support BookKeeper for this.

Setting up BookKeeper

Please refer to the BookKeeper documentation on how to set it up.

Running a HA Status Oracle

To run a Highly Available Status Oracle you need to specify the ZooKeeper server in which the Bookies get registered and also that you want to be HA, like this:

$ java -cp ./target/Omid-0.0.1-SNAPSHOT.jar com.yahoo.omid.tso.TSOServer -port <port> -ha -zk <zk connection string>

There are a couple of parameters you can change, like the ensemble size (-ensemble) and the quorum (-quorum).

  • Ensemble: number of Bookies the Status Oracle is going to use. It has to be <= the total number of bookies started.
  • Quorum: number of replicas for each entry, it has to be <= the ensemble size.

For example you could run 5 Bookies, have an ensemble size of 5 and a quorum of 3, so each write to the WAL is replicated to 3 bookies (which is faster than replicating to all of them).