Last Thursday we sent the Impala user group a demonstration of some Big Data analytic technology in early development here at Simba. You can watch the video here: Microsoft Excel PivotTables on Cloudera Impala via Simba MDX Provider. In this video, I give a short demonstration of how to use core PivotTable functionality in Excel 2010. The video shows how to establish connectivity, build a pivot, sort, format, and slice.
What the video doesn’t show or describe is the technical implementation under the covers. Conceptually, this implementation is very similar to the Teradata OLAP Connector. The architecture is described in Simba’s case study on Teradata’s implementation, the architectural diagram is on page two. Simba’s MDX Provider is an ODBO provider installed on the same machine as Excel. Simba also has a tool for building cube definitions, which we call schemas. These schemas are saved in XML. Simba’s schema maps MDX metadata constructs to Impala table structures. When an ODBO compliant tool such as Excel issues an MDX query, Simba’s MDX Provider maps the MDX query to HiveQL, sends the HiveQL to Impala, collects the results and returns them to the end user.
The most important technical concept is that there is no intermediate server or cube structure that caches data, all queries go direct from Simba’s MDX Provider to the Cloudera Impala server in real time.
Because Simba already has an MDX engine that translates MDX to SQL, it was not overly difficult to adapt the engine to issue HiveQL. HiveQL supports almost all SQL constructs. For this reason, this demonstration provider supports the breadth of MDX functionality, including calculated members and measures in the cube (not shown). For those of you in the know about MDX, the breadth of MDX query support was shown when using slicers. The slicer issues some pretty complicated MDX queries under the covers, so it’s a pretty good representation of what you can do.
As the technology is in early development, it is not generally available for early testing. We put this demo out there so that we could get some feedback and validation of the design. You can leave your comments here or in the Impala user group thread. Specifically, we’d love to hear feedback on:
- The types of data sets you’d like to analyze and the types of analysis you’d like to perform
- The MDX functionality you’d like to use
- Excel functionality you’d like to use
- Other BI clients you’d like to see supported by such a provider (Tableau?)
We are pretty excited about putting the world’s most popular BI tool on top of Big Data and using this as our first step towards a rich Big Data analytics story. Simba has a booth at Strata Santa Clara so if you’re at the conference, you can stop by and talk with us about Excel, Big Data, BI, and analytics and how you plan to put all these technologies into place in your organizations. Your stories and input are vital so that we can build out tools you can use.