Enable organizations and their teams to easily build data-driven apps that meet the dynamic requirements of customers: that is the common goal for DataStax and Simba, a Magnitude company. For DataStax, that means helping customers solve today’s most challenging big data problems by providing products and services around the popular open-source database, Apache Cassandra. For Simba, it means helping customers get to market faster, and more cost-effectively with off-the-shelf and customized data connectivity solutions.

DataStax is the leading commercial entity behind the Apache Cassandra project. Apache Cassandra is a NoSQL, highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

DataStax offers an enterprise distribution of the Apache Cassandra database called DataStax Enterprise (DSE), which integrates Apache Solr and Apache Spark for additional and advanced workloads. It includes all the enterprise functionality needed for serious production systems and full support from some of the best-distributed database experts in the world. DataStax Enterprise is recognized by Gartner across multiple categories as a best-in-class operational database.

The Challenge

DataStax Enterprise users query data not only with Cassandra, but also with Apache Spark, an analytics engine for large-scale data processing. In order to build a coherent analytics ecosystem, additional BI and ETL tools are imperative for inclusion. DataStax provides a set of ODBC and JDBC drivers to drive this ecosystem including a Simba SQL/CQL ODBC driver for Apache Cassandra and a Simba SQL ODBC driver for Apache Spark. Both drivers can be used independently with DataStax Enterprise.

The reality is Cassandra and Spark have different strengths and one often has significant advantages over the other for certain types of workloads.

Cassandra:

  • Cassandra is purpose-built for use cases with scale and high-availability requirements. It has great write performance and fast point look-up based on a key. Users are limited by how data can be queried to maximize these core characteristics.
  • The Cassandra query language (CQL) does not have syntax for joining tables. The ODBC driver, which runs on the client-side and handles all operations not supported by CQL, has to pull back a large volume of data from Cassandra and perform the join operation in memory. This type of query often results in out-of-memory errors with BI tools such as Tableau, Alteryx, and Power BI, especially on large datasets.

Spark:

  • Spark, on the other hand, provides very comprehensive SQL for querying large datasets.
  • The DataStax Enterprise distribution of Spark provides users a way to execute SQL queries on the data in Cassandra. However, Spark is not optimized for write and write operations are much slower when compared to Cassandra.

Knowing which ODBC driver to use is not always easy to determine for various BI and ETL tasks. DataStax recommends adhering to the strength of each technology that is being connected to and making the best of what’s currently available. In a nutshell, that means:

  1. Using the Cassandra ODBC driver for the BI use case of importing and exporting data.
  2. Using Spark and the Spark ODBC driver for more complicated reads, aggregations, joins, and ETL.

The guidelines, while helpful, still involve some complexity for the user. The DataStax product management team, working with our engineers, concluded there was a better way; that better way was a unified Cassandra and Spark driver with embedded logic.

The Solution

DataStax engaged Simba’s managed services team to combine the Simba Cassandra and Spark drivers into a single driver. This enables users to configure the driver to connect to Cassandra and/or Spark and simplify the installation experience.

Explore more products from DataStax: Luna, the support product for Apache Cassandra, Astra the Cassandra as a Service, and of course DSE, the enterprise offering.

Why Simba

Simba has a history of providing best-in-class, off-the-shelf and custom ODBC/JDBC drivers for all kinds of databases. According to Nick Panahi, Product Manager for Cloud solutions at DataStax, “This new, unified driver is the latest innovation resulting from our ongoing partnership with Simba. We’ve been using Simba’s drivers for the past several years so that our customers can easily connect enterprise analytics applications and business intelligence platforms to DataStax Enterprise.

Find Out How Simba’s Managed Services Can Help You

The DataStax partnership is an example of how Simba collaborates with customers to support their data connectivity needs. This partnership approach, along with the most extensive portfolio of standards-based, out-of-the-box connectors to major enterprise data sources, are the reasons software vendors and enterprises of all sizes select Simba.

For more information, please visit Simba’s Services Page or contact our Sales Team.