hdinsight-drill

Run an Apache Drill cluster on Azure HDInsight

Install and use Apache Drill on HDInsight Hadoop clusters

Install Apache Drill on Azure HDInsight using a Script Action. A script to install Apache Drill (1.10) is available at:

https://raw.githubusercontent.com/yaron2/hdinsight-drill/master/setup.sh

The script will work on both existing and new HDInsight clusters. To install Apache Drill on a new cluster, perform the following steps:

From the Cluster summary blade, select Advanced settings, then Script actions. Click Submit new and choose Custom. Use the following to populate the form:
- NAME: Enter a friendly name for the script action.
- SCRIPT URI: https://raw.githubusercontent.com/yaron2/hdinsight-drill/master/setup.sh
- HEAD: Don't check this option
- WORKER: Check this option
- ZOOKEEPER: Don't check this option
- PARAMETERS: Leave this field blank
At the bottom of the Script actions blade, use the Select button to save the configuration. Finally, use the Next button to return to the Cluster summary
From the Cluster summary page, select Create to create the cluster.

Using Drill on HDInsight

Drill is installed on all HDInsight Worker Nodes. In order to connect with Drill, you need to obtain the IP address of any Worker node in the HDInsight cluster.

One way to do so is to open the Ambari Cluster Dashboard at HTTPS://CLUSTERNAME.azurehdidnsight.net where CLUSTERNAME is the name of your cluster. Once logged in, click on Hosts, and obtain the ip address of any worker node that begins with wn.

In order to connect to the internal Worker nodes, setup SSH Tunneling and configure your browser as outlined here: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-linux-ambari-ssh-tunnel

Connect with drill shell

Connect to the HDInsight cluster using SSH:

ssh USERNAME@CLUSTERNAME-ssh.azurehdinsight.net

Connect to one of the Worker nodes using SSH:
```
ssh USERNAME@WORKER-NODE-IP
```

Start the Drill Shell:

 sudo ./var/lib/drill/apache-drill-1.10.0/bin/drill-conf

Verify everything's working with a simple SELECT to query the drillbits
```
SELECT * FROM sys.drillbits;
```

Using the Drill UI

After establishig the SSH tunneling, obtain an IP address of any Worker node as described above and go to http://node-ip:8047

Using Azure Blob Storage

In order to query data from Azure Blob Storage, you first need to add it to the list of plugins. To do so, connect to the Drill UI and navigate to the Storage page.

Then do the following:

Click on the Update button for the dfs plugin, and copy its contents.
Enter a name for your Azure account at the bottom of the page and click the Create button.
Delete the null value and paste the contents you copied earlier.
Change "file:///" to "wasb://mycontainer@mydatafiles.blob.core.windows.net/" and change the container and account name accordingly.

See here for more info on Drill and Azure Blob Storage.

What cluster types are valid?

The script supports only hadoop clusters. Other cluster types (Spark, Kafka, Storm, Secure Hadoop etc.) are not supported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

hdinsight-drill

Install and use Apache Drill on HDInsight Hadoop clusters

Using Drill on HDInsight

Connect with drill shell

Using the Drill UI

Using Azure Blob Storage

What cluster types are valid?

Files

README.md

Latest commit

History

README.md

File metadata and controls

hdinsight-drill

Install and use Apache Drill on HDInsight Hadoop clusters

Using Drill on HDInsight

Connect with drill shell

Using the Drill UI

Using Azure Blob Storage

What cluster types are valid?