The big news of the day is likely Oracle's announcement of the specifics of their Big Data Appliance. As usual, TheReg has done a great job running down the specs. What I want to focus in on is the connectors that were announced today. The details are still somewhat thin but here's what I understand to be their offerings:
1. “Oracle Loader for Hadoop, which moved data from Oracle 11g R2 databases to Hadoop data stores.”
Sounds like SQOOP to me.
2. “Oracle Data Integrator for Hadoop, a twist on the existing Data Integrator tool that can automatically generate MapReduce code to chew on data and bring data sets into view of Oracle databases.”
This must be Cloudera’s ODBC driver that Cloudera announced late last year together with Tableau. I wonder how they are addressing the HiveQL/SQL gap.
3. “Direct Connection for HDFS, and this essentially makes a section of the HDFS file system holding mapped and reduced data to be viewed as an Oracle database table.”
This sounds like a great idea. If this connector bypasses Thrift, it offers the potential for the highest performance.
4. “R Connector for Hadoop. With this, Oracle has indeed taken the open source R statistical analysis package and added optimized math libraries to link it to the various data stores in the Big Data Appliance stack. This connector is not based on the Hadoop-friendly R tools from Revolution Analytics.”
A specialized driver to address R.
Specifics aside, what looks clear is that integration/connectivity with "the establishment" is important. So what do you think and/or need? If you're trying to fit Hadoop into your landscape, drop me a line and let me know.