-
Notifications
You must be signed in to change notification settings - Fork 106
Home
Omid, which stands for Optimistically transaction Management In Data stores, is an open-source project started at Yahoo! that aims to provide transactions to datastores in the NoSQL ecosystem (e.g. HBase)
Here, we walk you through the architecture of Omid. If you want to skip to a hands on approach to Omid, please read Getting Started.
The following sections present Omid and its high-level architecture, and detail the motivation and the problems behind Omid.
Omid adds lock-free transactional support on top of HBase. Omid benefits from a centralized scheme in which a single server, called The Status Oracle (TSO,) monitors the modified rows by transactions and use that to detect write-write conflicts. HBase clients in Omid maintain a read-only snapshot (copy) of transaction commit times to reduce the load on the TSO, making it scalable up to 50,000 transactions per second (TPS.)
The following figure shows a high level view of the Omid's architecture
The core architecture of the software is described in more detail in the Technical Details.
Some experimental results could also be found here: Experimental Results.
A transaction comprises a unit of work against a database, which must either entirely complete (i.e., commit) or have no effect (i.e., abort). In other words, partial executions of the transaction are not defined. Without the support for transactions, the developers are burdened with ensuring atomic execution of a multi-row transaction despite failures as well as concurrent accesses to the database by other transactions. Data stores such as HBase, BigTable, PNUTS, and Cassandra, usually lack this precious feature.
Snapshot isolation (SI) guarantees that all reads of a transaction are performed on a snapshot of the database that corresponds to a valid database state with no concurrent transaction. To implement SI, the database maintains multiple versions of the data in some data servers, and transactions, running by clients, observe different versions of the data depending on their start time. For example, transaction txn_n_ reads the modifications made by the transaction txn_o_, but not the ones made by the concurrent transaction txn_c_ because txn_c_ is not committed when txn_n_ starts. Implementations of SI have the advantage that writes of a transaction do not block the reads of others. Two concurrent transactions still conflict if they write into the same data item, say a database row.
The serializablity is provided in the readWrite branch. Check out the details here.
- Omid is lock-free. In lock-based approaches, the locks that are held by the incomplete transactions of a failed client prevent others from making progress. In Omid, if a client is slow or faulty, it does not slow down the other clients.
- Omid does not require any modification into HBase code. All the transactional logic is implemented in the TSO) and the Transactional Clients (TCs.)
- Omid does not require any change into HBase table schema. The only change into the data is that the version of an inserted value is assigned to the transaction start timestamp, to enable the transactions to read from a snapshot.
- Contrary to previous approaches the commit timestamps are not persisted into data servers. Omid, therefore, brings transaction support to distributed data stores with a negligible overhead on the HBase servers. Therefore, Omid enables transactions for many applications running on top of HBase with no perceptible impact on performance.
In the centralized implementation of SI, a single server -The Status Oracle or TSO- receives the commit requests accompanied by the set of the identifiers (id) of modified rows, R. Since the TSO has observed the modified rows by the previous commit requests, it has enough information to check if there is temporal overlap for each modified row. The timestamps are obtained from a timestamp oracle integrated into the status oracle and the uncommitted data of transactions are stored on the same data tables.
For each transaction, the TSO server sends/receives the following main messages:
- Timestamp Request/Timestamp Response,
- isCommitted Query/isCommitted Response,
- Commit Request/Commit Response,
- Abort Cleaned-up.
Since the timestamp oracle is integrated into the TSO, the transactional clients (TCs) can obtain the start timestamp from the TSO. The following list details the steps of transactions:
- Transaction start Before performing any read or write, the TCs obtain a start timestamp from the TSO.
- Single-row write. A write operation by transaction txn_w_ is performed by simply writing the new data with a version equal to the transaction start timestamp, Ts(txn_w_).
- Transaction commit. After a TC has written its values on the rows, it tries to commit them by submitting to the TSO a commit request, which consists of the start timestamp Ts(txn_w_) as well as the list of all the modified rows, R. If the TSO aborts the transaction, the client must clean up the modified rows.
- Single-row cleanup. After a transaction aborts, it should clean up all the modified rows. To clean up each row after an abort, the transaction deletes its written versions. This is an extra overhead on data servers, which occurs rarely, only after aborts.
- Single-row Read. Each read in transaction txn_r_ must observe the last committed data before Ts(txn_r_). To do so, starting with the latest version (assuming that the versions are sorted by timestamp in ascending order), it looks for the first value with commit timestamp δ, where δ < Ts(txn_r_). To verify, the transaction inquires the txn_w_ commit time of the TSO or its local, read-only replica on the client side.
hbase-trx is an ongoing project that attempts to extend HBase with transactional support. Similar to Percolator, hbase-trx runs a 2PC algorithm to detect write-write conflicts. In contrast to Percolator, hbase-trx generates a transaction id locally (by generating a random integer) rather than acquiring one from a global oracle. During the commit preparation phase, hbase-trx detects write-write conflicts and caches the write operations in a server-side state object. On commit, the data server (i.e., RegionServer) applies the write operations to its regions. Each data server considers the commit preparation and applies the commit in isolation. There is no global knowledge of the commit status of transactions.
In the case of a client failure after a commit preparation has been sent to a data server, the transaction will eventually be applied optimistically after a timeout, regardless of the correct status of the transaction. This could lead to inconsistency in the database. To resolve this issue, hbase-trx would require a global transaction status oracle similar to that presented in this project. hbase-trx does not use the timestamp attribute of HBase fields; transaction ids are randomly generated integers. Consequently, hbase-trx is unable to offer snapshot isolation, as there is no fixed order in which transactions are written to the database.
It can be achieved by partitioning the load on multiple TSO. A proof-of-concept implementation is at MegaOmid branch. Check out the details here.
If you want to contribute, clone the repository and start hacking. If you want yours changes accepted, please refer to How to contribute section.
- Daniel Gómez Ferro ([email protected])
- Francisco Perez-Sorrosal (fperez at yahoo-inc.com)
- Flavio Junqueira ([email protected])
- Ivan B. Kelly (... at yahoo-inc.com)
- Benjamin Reed (... at yahoo-inc.com)
- Maysam Yabandeh (firstname at yahoo-inc.com)
- [email protected] Users and developers mailing list
Omid
Copyright 2011-2015 Yahoo Inc. Licensed under the Apache License, Version 2.0