From May 24-25, I had the opportunity to join the community in conversation at HBaseCon 2016 and the first annual PhoenixCon.
Over the last couple of years, the SQL-on-Hadoop space has been a large growth area in the market. As a result, many are wondering what HBase has been up to.
One thing the team at HBase has been focusing on is helping users operate their databases at large scale. During the conference, a handful of HBase users shared their experiences with running or adopting HBase. With substantial use cases like Facebook, VISA and Airbnb, the presentations at the conferences showed that the HBase is still very capable of meeting the needs of large enterprises.
So why is there a concern for HBase’s popularity?
In an article from InfoWorld, the rise of newer databases was brought to attention:
It’s not that HBase, the ten-year-old technology, shrank recently — it’s more so that, viewed in conjunction with all the activity in the SQL-on-Hadoop market, HBase looks relatively quiet and small in comparison.
At HBaseCon, we heard and saw many announcements from the large customers who use HBase in production — this lets me know that things are still alive and well with HBase. There is a lot of noise out there, and it appears to be drowning out some substantial projects in the HBase ecosystem.
One key project rising above the noise is, of course, Apache Phoenix. Established four years ago, Phoenix began operating as an individual project apart from HBase, allowing it to evolve and move quickly in an innovative space.
Phoenix is architected as an optional layer on top of HBase, adding a second parallel SQL interface to HBase. This design has the benefit of not disrupting an HBase installation and its application. But the flip is that accessing your data from this new SQL interface is not automatic. Depending on how optimized an HBase installation is, one will need less or more work to enable the data to be useable from Phoenix.
Phoenix on HBase Compared to SQL-on-Hadoop
Apache Phoenix brings HBase into the SQL-on-Hadoop party with some notable nuance. For one, Phoenix and HBase need to be considered and operated as one system even though they are two different projects. However, whereas HBase is a core project packaged into every Hadoop distributions, Apache Phoenix is not. Notably, neither Cloudera nor MapR include Phoenix within their distribution. The net is that extra effort is typically required to install and integrate Apache Phoenix.
All the other SQL-on-Hadoop systems are not primarily targeting OLTP workloads; rather, they are optimized for aggregations and analytical workloads with SQL interface added for convenience. They don’t support data ingestion either, so you’ll have to use something else to load your data.
Apache Phoenix is similarly focused on aggregations and analytical workloads. However, the underlying HBase system is designed for an OLTP workload.
Don’t Worry — HBase is Not Being Left Behind
If you look at the database-engine ranking, you can see that all the other databases have a singular company promoting their technology. MongoDB, Cassandra, and Redis have companies behind them, such as MongoDB Inc. and Redis Labs. This means there is a more direct way for those databases to get measurements. You can even say there is a vested interest in that company to actually push the numbers.
Contrarily, HBase doesn’t have that singular entity; it has no singular company behind it. HBase is literally part of every distribution that you have — you get it with Cloudera, Hortonworks, MapR, Amazon EMR, etc. — whoever wants to use HBase is likely running it because it is a part of a lot of other projects. So, if you want to count the number of users, it is actually much more difficult to get an accurate number.
Nevertheless, with global companies like Facebook, Twitter, and Yahoo! using HBase, it is clear to say that this technology is not fringe — it is actually fairly mainstream and certainly very capable.
As for the point of HBase being hard to use, well, that is a statement that can be placed on any large-scale Hadoop or other NoSQL system. MongoDB is notorious for being easy to start developing with, but tough to get into production. This is the state for all NoSQL systems right now.
Embracing the Needs of The Strong HBase Community
With all the noise and activity, how is HBase going to rise above and avoid being classified as a complex fringe system?
The way I see it, Phoenix is exactly the response needed.
For many years, HBase has resisted adding the SQL layer, and so the Phoenix project emerged to fill that gap. Together, Phoenix on HBase has equivalent capability to meet the needs of their strong, vocal community.
We at Simba have been working within the SQL-on-Hadoop marketplace for five years now. Our proposition is that, if we are going to work with data, we are going to have to embrace the constituency of users, as well as the market of tools available.
There are human analysts out there that want to write SQL, and there are many SQL tools: Microsoft Excel, Tableau, Microsoft Power BI, MicroStrategy, Oracle, etc. If you want to expose your data to a larger user base, you have to make it accessible with the SQL-based BI tools. So whatever it is that you are doing, whatever database you are on, you have to get to a point where data is addressable with a schema that you can query with SQL.
The emergence of Phoenix is a perfect example of this trend. HBase is a great scalable OLTP system, but in order to bring that data to analysts who want to work with it, you have to include the amenities, so that they can query it using familiar SQL language. This allows the analyst to write a direct query or use the tools that generate SQL queries though UI interactions.
Many people try to make it a competition and say, “This is the best of whatever.” However, at this point, there is still a lot more to be played out in the SQL-on-Hadoop marketplace, to determine who or what technology will ultimately win.
The community around the HBase ecosystem continues to be its biggest strength. There is a rich diversity of participants: there are many companies involved and there are many ideas. And, if these participants are open and receptive — like they have been by hosting the HBase and the Phoenix events close together — they will continue to be in sync with each other and be receptive to the overall needs of users.
The community of Phoenix on HBase is here to stay, and will be evolving and pushing this technology forward, through better joint strategy and better optimization.
Thanks to Phoenix, HBase is now attending the SQL-on-Hadoop party — despite it being a noisy party. While some pundits want to peg HBase against other databases, the focus should remain on the needs of users and what appeals to them most. Sometimes it is good to make noise, other times, it is good to listen. HBase has been quite attentive.