Over the Easter weekend, Hive took a big step forward with a new API:


At this point, this new API is just a proposal. What is good to see is that the new API addresses crucial gaps in today’s API that hinders ODBC/JDBC including:

1. explicit support for sessions;
2. asynchronous query execution;
3. the ability to cancel running queries.

Reading between the line, ODBC and JDBC access is important.

Of course, this doesn’t mean that there aren’t ODBC drivers for Hive. Indeed, Alteryx, Cloudera, MapR and Microsoft all have their offerings. But this new API will improve the next generation of ODBC drivers.

The other gap that still exist is the difference between HiveQL and SQL. The majority of tools that generate SQL are not HiveQL-aware. The simplest example that comes to mind is a simple equi-join. Standard SQL syntax uses the keywords “INNER JOIN” whereas Hive only uses “JOIN”. This is just one example of simple changes that need to be made to Hive to have Hive “play nicely” with pre-existing apps.