The Apache Drill beta release is near! This guide will show you how to set up and use Drill in embedded mode.

Apache Drill is a low-latency distributed query engine capable of querying large datasets. It stands out from other execution engines in that it allows for nested schemas, multiple data sources and late binding schemas. (Find more information on the Drill architecture here.)

You can either build drill from the Github repository (apache/incubator-drill), or use the nightly build from the Apache website (http://incubator.apache.org/drill/). To build Drill from Github, you will need JDK 1.7, Maven 3 and Git. First clone the project from the master branch using

Change into the incubator-drill directory and build:

This will create a tar file in the incubator-drill/distribution/target directory. If you grabbed the nightly build rather than building off Github, you will already have the tar file, in which case you can start from this step onward.

First create a new directory to extract the contents into, then extract the tar file into that directory:

We will query through Drill using a tool called SQLLine which comes with the Drill distribution. To start using Drill, go to the directory into which you extracted the tar file and run SQLLine:

You can now execute SQL queries to Drill. To exit from SQLLine, run the command

You can see view the available schemas using this query

By default, DFS is the only configured storage plugin, so you can query off CSV, TSV, Parquet, and JSON files if they exist in the file system. To set it up to query off Hive or HBase, you will need to edit the storage plugins configuration. This can be accessed through localhost:8047. From there, click the Storage tab. A list of registered storage plugins will be shown and by default, only cp and dfs will be enabled.

Hbase configuration will look something like this:

The Hive configuration will look like this:

You are now able to query off your Hive and HBase tables through SQLLine. Drill supports multiple configurations of the same type so you can set up multiple Hive/HBase data sources.

And if you’d like to connect Drill to Tableau or your favorite BI tool, stay tuned!