Last week, Simba Technologies joined with several thousand close friends at the Javits Convention Center in New York City for the Strata + Hadoop World conference. The theme of this year’s NYC event was “Make Data Work,” seemingly all the more important given the event’s urban locale.

Some highlights:

How soon until we can take Hadoop for granted?
What’s the definition of Hadoop success? According to Cloudera Chief Strategy Officer Mike Olson, it’s when Hadoop “disappears.” In context, Olsen argued that Hadoop is reaching critical mass, an operational tipping point, if you will, and that the immediate future of Hadoop is a shift from implementation focus to analytical focus. In other words, now that we’ve all figured out how to set up Hadoop, it’s time to get to work. (In a sense, that priority is analogous to a commonly-cited Simba success factor: Connectivity so uqiquitous our customers take it for granted.) See Olson’s keynote (“Open Standards and the Modern Data Center”) here, and read Datanami’s editor Alex Woodie’s comments here.

Hey, look, dancers!
After a successful “impromptu” flash mob at June’s Hadoop Summit in San Jose, several of the Actian dancers made the trip to New York, where they performed (at times I confess what seemed like) every hour on the hour. I’m probably exaggerating a bit on the performance frequency, but the dancers hit the Actian “stage” (a 10×20 exhibit space!) several times each day, driving home Actian’s message to “Cut Hadoop Loose” in support of the launch of the Actian Analytics Platform – Express Hadoop SQL Edition, their community version of the Actian flagship analytics technology. (Here’s a clip with bystanders joining in the fun–I am not that guy at 1:31!.) I can’t help but think that even one week later the exhibitors next to the Actian booth can still hear that thumpin’ bass, and can probably perform those dance moves from memory.

Spark, Spark, Spark
Spark and SparkSQL-related technologies were everywhere at Strata + HW. Industry innovators like Platfora and Databricks talked up a Sparkstorm, but perhaps the most telling example of Spark’s prevalence was how many times I was asked about Spark at the Simba booth. (Many many.) I even met two analysts from a big software company that may or may not be HQ’ed in Cupertino, California who were sent to Strata just to learn about Spark.

BI tools are driving Hadoop investment
Based on attendee input, it’s clear that BI technology investments are pulling Hadoop (at least compatible Hadoop) into the mainstream enterprise. (See Mike Olson’s subsequent point above.) As a result, Hadoop evangelists are talking a lot less like IT leads and a lot more like data analysts. And that made BI providers like Tableau, MicroStrategy, Qlik, and SAP/Lumira quite popular stops on the exhibit-hall tour.

Speaking of SAP…
SAP Solution Marketing Manager John Schitka delivered his vision for Big Data in the year 2020. One fun fact from his presentation: In 2013, 90% of the Internet’s data had been created in just the previous two years. (HT: SINTEF.) I’m still calculating the exponential trending analysis estimate for 2014. See John’s keynote presentation here.

Speaking of fun facts…
Sneaker-clad New Yorker Cartoon Editor Bob Mankoff gave an inspired (and comic) keynote (“Crowdsourcing Humor: The New Yorker Caption Contest“) while wearing a pair of Google Glass(es). Some interesting tidbits:

  • The New Yorker’s back-page caption contest is roughly eight-and-a-half years old, and generates on average 5372 entries each week, with three finalists and one winner for each cartoon. (My entries have yet to make print.)
  • 84% of entries are from men. But 23% of winners are women.
  • The late Roger Ebert won on his 108th entry.
  • “The dirty little secret of humor?” “No algorithm.”
  • Bissociation is essential to good visual humor, at least applied to the caption contest, and it’s a technical term for “uniting two disparate frames of reference.” (Like illustrating a parrot on a business person’s shoulder.)
  • By sampling 500 contest entries, you get about 70% of the related semantic content in the 5000+ entries.

Mankoff eventually got around to the crowdsourcing component of his work. An editorial assistant culls each week’s entries, then Mankoff uses Survey Monkey to democratize selection of the three finalists from the entire New Yorker editorial staff. Turns out crowdsourced humor can be pretty darn funny. (Hooray for the collective judgment of the masses.) See his keynote here.