There has been a lot of talk about Hive/Hadoop replacing the relational database. I personally do not believe this will happen. Rather, I see the two co-existing. Simba has bet very heavily on this scenario by building the ODBC and JDBC drivers that allow Hive and other Big Data sources to co-exist and work well in the current enterprise landscape of SQL databases. Last week at Strata, Ken Rudin, from Facebook, was one of the keynote presenters. It is important to note that Hive originated at Facebook. Enterprise Tech has a good interview with Ken on Facebook’s use of Hadoop and a relational database. Some key points from this article are:
1. Facebook has adopted ” relational database technology for some of its analytics work.”
2. “The source of truth is our Hadoop system, and it runs HDFS, and a MapReduce layer on top of that, there is Hive relational layer on top of that, which was created at Facebook”
3. “It is not that we are better engineers and that we can build things that scale further than they can. A lot of the ways we make things scale is by cutting out the things we don’t need. The other relational products include things that their entire customer base needs, so they have to have a broad range of capabilities. Each time you add more capability, you are adding a little bit more overhead. We can get our systems to scale much higher because we just cut all of that out”
Basically, SQL and relational are still important and definitely part of the fabric. Industrial strength databases like Teradata and SQL Server have a lot of capabilites and add value. The trade-off between things like Hive and Presto is functionality for speed.