Blog

OLAP for Big Data…

July 18, 2014 facebooktwittergoogle_plusredditpinterestlinkedinmail by in Business Intelligence   Data Analytics   Multi-Dimensional Data Connectivity   OLAP   OLTP   Relational Database Connectivity   Shark   Spark   Web/Tech  

Is it me or has OLAP finally (re)emerged from the shadow of the transactional systems? I was pleasantly surprised by AtScale‘s subdued emergence at June’s Hadoop Summit. (Refer to my previous summary for more.) This week, Socrata’s Evan Chan presented “Interactive OLAP Queries using Cassandra and Spark” at the Seattle Spark Meetup. Evan began work using Spark with … Read More

Google Chucks MapReduce, Databricks is in the Cloud, and Shark Meets its Maker: A Spark Summit 2014 Synopsis

July 2, 2014 facebooktwittergoogle_plusredditpinterestlinkedinmail by in Big Data   Cloudera   Data Access   Databricks   Google   Hadoop   Hive   Hortonworks   MapR   ODBC   SAP   Shark   Spark   Web/Tech  

It is a post-MapReduce world. Last week’s announcement out of Google IO was that Google has retired MapReduce and replaced it with its home-grown cloud analytics system Cloud Dataflow. Rhetorics aside, it is good to see this from the originator of the concept. (I recall a similar declaration from Apache … Read More

Databricks – an interesting plan for Spark, Shark, and Spark SQL

July 2, 2014 facebooktwittergoogle_plusredditpinterestlinkedinmail by in Databricks   Hadoop   Shark   Spark   SQL  

This week was the Spark Summit in San Francisco.  It was a great event with a lot of interesting announcements. Databricks is the company promoting Spark and Shark and they made some interesting announcements. One interesting piece of news is that they are ending development of Shark and instead focusing their efforts … Read More

All About Apache Drill Data Sources and File Types

June 26, 2014 facebooktwittergoogle_plusredditpinterestlinkedinmail by in Big Data   Data Access   Data Terms / Applications   Hive   tagged as CSV   DFS   Drill   File Types   HBase   Hive   JSON  

Storage plugin extensibility is a key feature of Apache Drill. Drill supports Hive, HBase, and its DFS file system, which encompasses the CSV, TSV, JSON, and Parquet file types. You can configure Drill sources and data types via its web interface. (More details on that in my earlier blog here.) … Read More

Setting up Apache Drill to Query Hive and HBase: a How-to

June 19, 2014 facebooktwittergoogle_plusredditpinterestlinkedinmail by in Big Data   Data Technologies   Hadoop   Hive   Relational Database Connectivity   tagged as Drill   HBase   Hive  

The Apache Drill beta release is near! This guide will show you how to set up and use Drill in embedded mode. Apache Drill is a low-latency distributed query engine capable of querying large datasets. It stands out from other execution engines in that it allows for nested schemas, multiple … Read More

Teradata, Alteryx, and Big Data

June 18, 2014 facebooktwittergoogle_plusredditpinterestlinkedinmail by in Alteryx   Big Data   Data Access   Microsoft   Teradata  

I was reading a very good overview of Teradata written by Mark Smith entitled “Teradata Takes Bigger Approach to Big Data“.  This is a good overview of what Teradata is doing in the data warehouse and analytics space.  Mark does a good job reporting and this is worth reading. One … Read More

Hadoop Summit 2014 Summary

June 12, 2014 facebooktwittergoogle_plusredditpinterestlinkedinmail by in Big Data   Data Warehouses   Hadoop   Hive   NoSQL   OLAP   SQL   Web/Tech  

Here’s a re-cap of last week’s Hadoop Summit 2014. It’s difficult for me to convey my excitement thru the medium of words. As I explained in my previous post, YARN is a giant step forward from MapReduce and makes possible for Hadoop to become a general compute fabric. Together with Docker … Read More

Six Things I Learned at Hadoop Summit

June 12, 2014 facebooktwittergoogle_plusredditpinterestlinkedinmail by in Big Data   Data Analytics   Data Technologies   Hadoop   Hive   Hortonworks   JDBC   Microsoft Excel   ODBC   Relational Database Connectivity   Simba Technologies   SQL   Unstructured   tagged as Bacon   Cloudera   Hive   Hortonworks   JDBC   ODBC   SQL  

Several colleagues and I attended Hadoop Summit in San Jose last week. Here are six things I learned: 1. Simba gets around. No, that’s not our new marketing slogan. It’s just fact: Simba is now the de facto standard for Hadoop connectivity, whatever the protocol (ODBC, JDBC), format (structured, unstructured), … Read More