It’s been a great day 1 of the pre-Hadoop Summit workshop. I attended Rich Raposa‘ “Developing Custom YARN Applications” along with 26 others. Despite this being the inaugural running of the course, Rich did a great job explaining the crux of YARN. As he explains it, YARN takes away the JobTracker and Tasktracker and replaces them with the ResourceManager, the NodeManager and an assortment of ApplicationMasters and Containers. The critical eureka for me is that a YARN application is more accurately an application instance. In the end, YARN is a framework for harnessing the compute resources of a cluster. It means that you as a YARN application development have all the freedom to do whatever your application needs. . The corollary is that it can be used to house any number of tools including Tez (for Hive), Slider (formerly HOYA, for HBase), Spark, Solr and many others.
But beyond that, the discussion over lunch at my table about the many SQl-on-Hadoop options were thought provoking. I apologize that I didn’t catch everyone’s name but I’ll offer the following link as my PSA in return:
It’s a comparison between the biggest SQL-in/on-Hadoop options including Hive/Stinger, Impala, Presto, and Shark.
Now… onto some sleep before day 2.