The run-up to any conference can be rather feverish. This year’s Strata is turing out to be one such.


Hortonworks finally announced their Stinger initiative last Friday (after Sanjay had leaked it earlier at a TheHive event). As I heard it from Alan Gates, it is a program of enhancements to bring Hive’s query performance into “human use” realm.

Then, just before my flight down to San Francisco, EMC Greenplum announced their Pivotal HD (“Hawq”) distribution. As usual, El Reg has a good summary of the details while Gigaom has some backstory. In short, the SQLization of NoSQL/BigData is real. The trend that Cloudera very publically kicked off at last October’s Strata with Impala is now in full bloom.

Running down the list of vendors, just about everyone now has something in the burner now:

Contributor/Author Technology Note
Google F1 “Spanner”. SQL. Transactions.
Cloudera Impala HiveQL; will be open sourced. Currently in beta release as part of CDH4; GA expected in CQ1.
MapR Drill SQL-2003; in progress; in development as an Apache incubator project.
Hortonworks Stinger initiative Program of enhancements to existing projects (including Hive) and also proposes two
new projects: Tez and Knox. Expected in March.
EMC Greenplum Pivotal HD “Hawq” SQL-20XX. Source status unclear currently. Phoenix Contribution to Apache. No defined release vehicle yet.

Synchronicity-ly, there’s a session Wednesday afternoon by Tom O’Brien aptly titled “The Future of Relational (or Why You Can’t Escape SQL)”. I’ll be checking that out to take pulse on this trend.

On a broader note, Intel has just threw their hat into the Hadoop distro ring by announcing their IDH. (You can read more about the extensive Asian connections here.) I’ll be giving a tech talk Thursday 2PM at Intel’s booth (#101) about Hadoop’s query language scene. Stop by to say hi if you are interested in this fast moving  scene.

But onto day 0 events: the tutorials.
Jonathan Hsieh did a great job with his morning tutorial on HBase for the app developers. He gave great examples and tips for one pondering the question of “Whither HBase”. With O’reilly’s move to putting all real content behind the paywall, the session won’t be readily available but Jonathan has promised to post his slides. I’ll post the link when that shows up.

For those of us who enjoyed Ryan Boyd & Michael Manoochehri’s shows on the Google Developers channel, the dynamic duo joined up with Julia Ferraioli to present a tutorial to takes one thru the end-to-end process of collecting, persisting, processing, and visualizing data using collective might of Google App Engine, Compute Engine, BigQuery. Aside from the rough edges of getting all the relevant tools downloaded, installed and configured, the process went smoother than expected. Having worked with BigQuery previously, I’m familiar with that piece of the picture. I still have to finish the assignment (thanks Julia for keeping the group up!) due to a Cygwin problem during class but it brings out the kid in me to see everything run from soup to nuts.

And that’s all for day 0! The formal Strata hasn’t even started.