Skip to content
rxin edited this page Oct 8, 2012 · 51 revisions

Data To Play With

create table src(key int, value string);
LOAD DATA LOCAL INPATH '${env:HIVE_DEV_HOME}/data/files/kv1.txt' INTO TABLE src;

create table src1(key int, value string);
LOAD DATA LOCAL INPATH '${env:HIVE_DEV_HOME}/data/files/kv3.txt' INTO TABLE src1;

Note that you may have to create a /user/hive/warehouse/src path before executing these commands.

Setup

Development of Shark (run tests or use Eclipse) requires the (patched) development package of Hive. Clone it from github and package it:

$ git clone https://github.com/amplab/hive.git -b shark-0.9
$ cd hive
$ ant package

Then set $HIVE_HOME and $HIVE_DEV_HOME in conf/shark-env.sh. $HIVE_DEV_HOME should point to the path of the git repository, e.g. ~/shark-dev/hive, and $HIVE_HOME should set to $HIVE_DEV_HOME/build/dist

ant package builds all Hive jars and put them into build/dist directory.

Eclipse

We use a combination of vim/emacs and Eclipse to develop Shark. It is often handy to use Eclipse when you need to cross-reference a lot to understand the code. Since Shark is written in Scala, you will need the Scala IDE for Eclipse to work with.

  1. Download Eclipse Indigo 3.7 (Eclipse IDE for Java Developers) from http://www.eclipse.org/downloads/
  2. Install the Scala IDE for Eclipse plugin. See http://scala-ide.org/download/current.html

To generate the Eclipse project files, do

$ sbt/sbt eclipse

Once you run the above command, you will be able to open the Scala project in Eclipse. Note that Eclipse is often buggy and the compilers/parsers can crash while editing your file.

We recommend you turn Eclipse's auto build off, and use sbt's continuous compilation mode to build the project.

$ sbt/sbt
> ~ products

To setup Hive project for Eclipse, follow https://cwiki.apache.org/confluence/display/Hive/GettingStarted+EclipseSetup

Testing

To run Hive's test suite, first generate Hive's TestCliDriver script.

$ ant package
$ ant test -Dtestcase=TestCliDriver

The above command generates the Hive test Java files from Velocity templates, and then starts executing the tests. You can stop once the tests start running.

Then compile our test

$ sbt/sbt test:compile

Then run the test with

$ TEST=regex_pattern ./bin/test

You can control what tests to run by changing the TEST environmental variable. If specified, only tests that match the TEST regex will be run. You can only specify a whitelist of test suite to run using TEST_FILE. For example, to run our regression test, you can do

$ TEST_FILE=src/test/tests_pass.txt ./bin/test

You can also combine both TEST and TEST_FILE, in which case only tests that satisfy both filters will be executed.

An example:

# Run only tests that begin with "union" or "input"
$ TEST="testCliDriver_(union|input)" TEST_FILE=src/test/tests_pass.txt ./bin/test 2>&1 | tee test.log

Hive and Hadoop Resources

Hive Developer Guide