I was reading an article by Dan Woods at Forbes.com titled “Can Hortonworks Dominate The Hadoop Market?” Dan analyzes where Hadoop market and where Hortonworks fits into the scheme of things. This is good analysis to see where the Hadoop market is today and where it can be by 2020. Dan also explains some of the differences between the different Hadoop distros and also the different cloud offerings.
One point Dan makes that caught my attention is: “While it is true that Hadoop can be the container for many types of data, in my view, the biggest victory comes from creating a common language for how to understand your business. While it would be really great for this common language to be developed and created in one place, we have too many attractive forms of repositories and analytics for that to be true. We will have excellent integrated models in the data warehouse, that are heavily used, other models in graphs, and lots of nouns and verbs and events discovered inside of Hadoop. The strongest approach will be to accept this heterogeneity and build a data supply chain that moves data around as needed to support various types of workloads and interactions between systems. In other words, accept you have a data supply chain with multiple repositories and forget about the idea of the center of the universe.”
I totally agree with Dan that the world of data is and will continue to be heterogeneous. A few years ago, many were saying that Hadoop and NoSQL were going to displace the traditional relational database vendors. I think now most people realize that these will augment rather than replace. And that is the crux of what Dan is saying. We need to build a “data supply chain” that allows us to integrate and analyze data from heterogeneous data sources. The SQL language was a powerful tool in the world of the RDBMS and the data warehouse. Today, we are seeing the Hadoop and the NoSQL world rapidly adopt SQL. What Cloudera has done with Impala, what Databricks has done with SparkSQL, and what Couchbase has done with N1QL are all prime examples of this. My guess is this great innovation will continue and we will ultimately have an extended SQL syntax that supports many new types of data and queries.