-
Notifications
You must be signed in to change notification settings - Fork 327
Running Shark with Tachyon
Haoyuan Li edited this page Jun 27, 2014
·
24 revisions
Shark 0.7 adds a new storage format to support efficiently reading data from Tachyon, which enables data sharing and isolation across instances of Shark. Our meetup slide gives a good overview of the benefits of using Tachyon to cache Shark's tables. In summary, the followings are four major ones:
- In-memory data sharing across multiple Shark instances (i.e. stronger isolation)
- Instant recovery of in-memory tables
- Reduce heap size => faster GC in shark
- If the table is larger than the memory size, only the hot columns will be cached in memory
- Shark / Tachyon Compatibility: Shark 0.7.x works with Tachyon 0.2.1, Shark 0.8.1 works with Tachyon 0.3.0, Shark 0.9.0 works with Tachyon 0.4.0. For more Tachyon related information, please visit Tachyon's website.
In order to use Spark on Tachyon, you need to setup first, either Local Mode, or Cluster Mode.
Then, edit shark-env.sh
and add
export TACHYON_MASTER="tachyon://TachyonMasterHost:TachyonMasterPort"
export TACHYON_WAREHOUSE_PATH=/sharktables
CREATE TABLE data TBLPROPERTIES(“shark.cache” = “tachyon”) AS SELECT a, b, c from data_on_disk WHERE month=“May”;
CREATE TABLE orders_tachyon AS SELECT * FROM orders;
After creating the table in Tachyon, you can query it like query normal tables.