It must have been a busier than usual week in San Francisco last week. (Or at least when you check in to your hotel and the attendant asks you which of three conferences you’re visiting for, I figure it might be an atypical week.) The 2014 Cassandra Summit was at the Westin St. Francis by Union Square. I applaud the organizers for picking interesting venues. Last year’s event at Fort Mason coincided with Oracle’s America Cup practices out in the bay; it was memorable enough for some attendees this year to compare between 2013 and 2014.

But venues aside, the passage of a year really shows in the progress of Cassandra and Datastax. Last year, CQL3 was an idea. This year, CQL3 was a full day course by Patrick McFadin and a good dense session by John Berryman. If I can summarize Patrick’s full day course, it boiled down to four principles which are universal not just for Cassandra but all NoSQL databases:

  1. Know your data
  2. Know your query
  3. Nest data
  4. Duplicate data

The key practice that Cassandra has now embraced is data modeling. The full-day course was titled “Cassandra Data Modeling” which you wouldn’t have expected a few short years ago (“Modeling? You mean schema? Why do you need that?”). But real-world data truly has schema. The schema can be baked into the database explicitly (as with SQL RDBMS) or within the application (as with most NoSQL databases). The latter is not a good idea given the difficulty of extracting it from application logic, keeping the schema in sync as applications multiply, etc. So Cassandra is rather bold to go ‘back to the future’ by embracing schema fully with CQL3.

Patrick gave a good if hurried run down of the stages (conceptual to logical to physical) and presented a new Chebotko diagram. Definitely something to check out if you’re planning to get into Cassandra today.

The video of John Berryman’s Thursday afternoon session on CQL3 isn’t up yet on Planet Cassandra. Watch for it.