Why MDX?

November 1, 2010 facebooktwittergoogle_plusredditpinterestlinkedinmail by in Business Intelligence   Data Warehouses   General   Interoperability   MDX   Multi-Dimensional Data Connectivity   OLAP   OLE DB for OLAP   Relational Database Connectivity   SQL  

In my last post, I mentioned “The wonderful thing about standards is that there are so many to choose from.” Usually this is meant by the speaker in a pejorative sense, indicating perhaps a perceived unwillingness of some vendors to work within existing standards, and instead wishing to define their own standard for any number of reasons, such as avoiding patent and IP issues, wishing to establish market leadership, locking out competitors, etc.

One good reason for introducing a new standard is efficiency. Most modern relational databases management systems are transactional databases. SQL is inherently transactional, making it ideal for OLTP databases. So, does that necessarily mean that if you have a database, you should access the data using SQL? Since you have a hammer, does everything look like a nail?

OLTP databases often need to be able to write lots of data, quickly. But businesses later need to deeply analyze all that transactional data. For that, a copy is extracted from the OLTP database, transformed into a more optimal format, and loaded into a data warehouse or an OLAP database. Yes, you can still use SQL to look inside a data warehouse, but it’s harder to use SQL to find patterns, quickly aggregate across the whole dataset, or do comparisons such as a time series. You can grab a coffee while you wait for it to finish.

In the mid-1990s, Microsoft introduced a new language for Multi-Dimensional eXpressions against an OLAP database, called MDX. It is a powerful but compact query language that knows and understand multi-dimensional databases/cubes. For example, if you want to know the sales of souvenir mittens at your retail outlet in Vancouver, BC during the month of February when the 2010 Olympics were held, you would write an MDX query that this:

SELECT
[Store].[Store Country].[Canada].[Vancouver] ON COLUMNS,
[Product].[All Products].[Clothing].[Mittens] ON ROWS
FROM [Sales]
WHERE ([Measures].[Unit Sales], [Date].[2010].[February])

However, if you wanted to write this in SQL, you’d need this cumbersome SQL query:

SELECT SUM(Sales.[Unit Sales])
FROM (Sales INNER JOIN Stores
ON Sales.StoreID = Stores.StoreID)
INNER JOIN Products
ON Sales.ProductID = Products.ProductID
WHERE Stores.StoreCity = ‘Vancouver’
AND Products.ProductName = ‘Mittens’
AND Sales.SaleDate BETWEEN ’01-02-2010′ AND ’28-02-2010′

It becomes even more verbose to write an SQL expression that compares Feb 2010 sales to that of the previous month, or the previous year. In contrast, MDX makes it easy to write very short succinct queries that are more direct to read and understand, which in turn reduces errors. And you can write more general expressions like “prior period” comparisons, that at every level of drill-down in time hierarchy (e.g. day, month, quarter, year) show or compare to the prior unit of time at that level.

I liken it to writing a program in assembler code versus in a high-level language programming language. You could do it, but why would you? This is not a perfect example, since assembler code is often faster than the high level language. However, in the SQL vs. MDX debate, it would be the MDX that is both easier to read and often faster since it expresses more clearly the goal of the query!

In recent years, many BI client vendors (SAP Business Objects, Microsoft, Cognos, and others) allow connections using MDX. This is a great move as it allows more powerful and general queries, an opportunity for better performance (especially against a cube), and greater interoperability. The latter is particularly valuable in the modern enterprise composed of acquisitions and mergers, thus leading to multi-vendor BI challenges that need to be bridged.

One good reason for introducing a new standard is efficiency. Most modern relational database management systems are transactional databases. SQL is inherently transactional, making it ideal for OLTP databases. So, does that necessarily mean that if you have a database, you should access the data using SQL? Since you have a hammer, does everything look like a nail?

OLTP databases often need to be able to write lots of data, quickly. But businesses later need to deeply analyze all that transactional data. For that, a copy is extracted from the OLTP database, transformed into a more optimal format, and loaded into a data warehouse or an OLAP database. Yes, you can still use SQL to look inside a data warehouse, but it's harder to use SQL to find patterns, quickly aggregate across the whole dataset, or do comparisons such as a time series. You can grab a coffee while you wait for it to finish.

In the mid-1990s, Microsoft introduced a new language for Multi-Dimensional eXpressions against an OLAP database, called MDX. It is a powerful but compact query language that knows and understands multi-dimensional databases/cubes. For example, if you want to know the sales of souvenir mittens at your retail outlet in Vancouver, BC during the month of February when the 2010 Olympics were held, you would write an MDX query that looks like this:

SELECT
[Store].[Store Country].[Canada].[Vancouver] ON COLUMNS,
[Product].[All Products].[Clothing].[Mittens] ON ROWS
FROM [Sales]
WHERE ([Measures].[Unit Sales], [Date].[2010].[February])

However, if you wanted to write this in SQL, you’d need a more cumbersome SQL query, something like this:

SELECT SUM(Sales.[Unit Sales])
FROM (Sales INNER JOIN Stores
ON Sales.StoreID = Stores.StoreID)
INNER JOIN Products
ON Sales.ProductID = Products.ProductID
WHERE Stores.StoreCity = 'Vancouver'
AND Products.ProductName = 'Mittens'
AND Sales.SaleDate BETWEEN '01-02-2010' AND '28-02-2010'

It becomes even more verbose to write an SQL expression that compares Feb 2010 sales to that of the previous month, or the previous year. In contrast, MDX makes it easy to write very short succinct queries that are more direct to read and understand, which in turn reduces errors. And you can write more general expressions like "prior period" comparisons, that at every level of drill-down in time hierarchy (e.g. day, month, quarter, year) show or compare to the prior unit of time at that level.

I liken it to writing a program in assembler code versus in a high-level programming language. You could do it, but why would you? This is not a perfect example, since assembler code is often faster than the high level language. However, in the SQL vs. MDX debate, it would be the MDX that is both easier to read and often faster since it expresses more clearly the goal of the query!

In recent years, many BI client vendors (SAP BusinessObjects, Microsoft, Cognos, and others) allow connections using MDX. This is a great move as it allows more powerful and general queries, an opportunity for better performance (especially against a cube), and greater interoperability. The latter is particularly valuable in the modern enterprise composed of acquisitions and mergers, thus leading to multi-vendor BI challenges that need to be bridged.