Last week’s combined Strata/Hadoop World at New City is a wrap. It was a high-energy two days with major announcements from the top Hadoop distributors–Cloudera with Impala, MapR with M7 and Hortonworks with a Teradata/Aster partnership.
6a00e3932f172e8834017ee558a975970d-pi

We had a booth and there was a good stream of attendee stopping by to learn about Simba and our offerings. You can read about what we’ve been doing with Tableau at their blog. 2012 is not over yet and I’m hoping that we’ll be able to talk about some of what was not announced soon.

The size of the event is definitely indicative of where Big Data is heading. The NY Hilton was packed to the rafters; I do hope that they find a larger facility next year. Speaking of which, dates for the spring event in Santa Clara (Feb 25-27, 2013) and fall event in New York (October 28-30, 2013) have been announced.

This fall’s event was a forward step in Hadoop’s maturation. Last year’s event primarily focused on what Hadoop was about. This year moved on to Hadoop’s future. Notably, the consensus is that Hadoop and high-latency jobs are not synonymous. Hadoop and SQL/OLTP are reconcilable. The consistent theme between Cloudera’s Impala and MapR’s M7 is that Hadoop can provide low-latency–dare I say it, RDBMS-like–access.

Another way to measure this is to look at where the product focus is; Pig and Hive are so 2012. 2013 is about HBase and its evolution. You can see this with startups such as Splice Machine and Drawn To Scale.

But even then, Hive is still a relevant topic. Jason Dai from Intel Shanghai shared of Panthera, a project to enhance Hive’s query language, HiveQL. Dean Wampler’s Hive book is finally out.

Lesser themes present that might be overlooked are Drill and BigQuery. The Drill project is making good progress and Google showed off BigQuery’s progress in the open
marketplace.

Due to Oreilly’s putting most of the recordings of the topical sessions behind their paywall, I won’t be able to provide clickable links. You will need to look for them yourself from either Oreilly or Safari. As well, I will be focusing in on real-world experiences and/or emerging trends.

Without further ado:

  • Big Data for the Masses: How We Opened Up the Doors to Google’s Dremel
  • Drill into Big Data
    • These two sessions provide two perspectives (closed and open source) on an emerging post-Hadoop trend: high-speed in-memory table scans.
  • How Draw Something Absorbed 50 Million New Users, in 50 Days, With Zero App Downtime
    • This session relates the experience of the Draw Something team as they dealt with the explosive growth of their iPhone app. You may recall that Zynga acquired Draw Something for $200M this past spring. The challenge of scaling from zero to 50 million user in 50 days is incredible.
  • High Availability for the HDFS NameNode: Phase 2
    • The NameNode is an area of intense focus as everyone wants to remove it as a single point of failure.
  • Building the Next Platform for Analytic Apps in the Cloud
    • George Mathew presented Alteryx’s experience building their new platform using JSON/REST, choosing scalable technologies, choosing deployable technologies, and designing for fault tolerance.