It began when Hortonworks released HDP 1.1 earlier in September. Buried within the documentation was a little appreciated detail:
MapReduce, Pig, and Hive jobs are placed in queue by Templeton and can be monitored for progress or stopped as required.
Templeton is headed up by Alan Gates and he elaborated on it at the previous Hadoop Summit. A Hive job is a HiveQL command. Given that Templeton also includes authentication, Templeton is a bona-fide successor to today’s HiveServer(1).
Fast forward a couple of weeks to Cloudera’s release of CDH4.1. You may recall a few months ago that Carl Steinbach had proposed HiveServer2. (Refer to my post outlining the motivation for HiveServer2 here.) HiveServer2 arrived in CDH4.1. Here’s how the press release describes it:
Hive security and concurrency – we’ve fixed some long standing issues with running Hive. With CDH4.1, it is now possible to run a shared Hive instance where users submit queries using Kerberos authentication. In addition this new Hive server supports multiple users submitting queries at the same time.
Many enterprise customers have been thwarted from deploying Hive widely due to the lack of an authenticated SQL-like interface to their HDFS data. We now have a choice of two alternatives for an industrial-strength Hive server. There are some differences between the two: REST vs THRIFT; WebHDFS vs row processing. But they both profess authentication and concurrency.
All that’s missing are corresponding ODBC drivers to bring each interface to everyone’s desktop.
It will be exciting to see how the wider Big Data/Hadoop community responds to the choices suddently available for HiveQL.