Hive grows up: New HiveServer2 API proposed

April 12, 2012 facebooktwittergoogle_plusredditpinterestlinkedinmail by in Big Data   Business Intelligence   Data Access   Excel Pivot Tables   Hadoop   JDBC   ODBC   SQL   Web/Tech  

Over the Easter weekend, Hive took a big step forward with a new API:

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API

At this point, this new API is just a proposal. What is good to see is that the new API addresses crucial gaps in today's API that hinders ODBC/JDBC including:

1. explicit support for sessions;
2. asynchronous query execution;
3. the ability to cancel running queries.

Reading between the line, ODBC and JDBC access is important.

Of course, this doesn't mean that there aren't ODBC drivers for Hive. Indeed, Alteryx, Cloudera, MapR and Microsoft all have their offerings. But this new API will improve the next generation of ODBC drivers.

The other gap that still exist is the difference between HiveQL and SQL. The majority of tools that generate SQL are not HiveQL-aware. The simplest example that comes to mind is a simple equi-join. Standard SQL syntax uses the keywords "INNER JOIN" whereas Hive only uses "JOIN". This is just one example of simple changes that need to be made to Hive to have Hive "play nicely" with pre-existing apps.