When building an ODBC driver, you need to map the data from the data-source into a relational form. This is because ODBC and SQL expect and work on data that is structured with a fixed schema and returns data in a tabular format. Using the SimbaEngine SDK, if you can map your data into a relational format, you can easily build an ODBC driver that will work with applications such as Excel and Tableau. This post will examine a few different cases and show how they can all be mapped into a relational format.

A simple example would be where the data exists in CSV files. For the purposes of this example, let’s assume that each row in the CSV has the same number of columns. A sample of the data might be:

In this example, it’s easy to map to a relational model because the data is already represented relationally. The only difficulty is determining the metadata for the columns in the CSV. You could do a few things: map the columns to string, create another CSV file specifically for holding metadata about the other CSV files, create another file (such as XML) that is associated with each CSV file and contains the metadata about that file, or put in a header line containing the metadata to name a few. The only important thing is that you can create code which exposes this data and metadata.

A more complicated example might be having a JSON data source. There are a few different ways to map this data, so let’s consider the following (contrived) data sample:

Notice that in the second JSON document, there are one element in the “items” array while there are two in the first. To map this data to a relational model, I’m going to discuss two different approaches. In both cases, the you will need to know the complete set of fields that are present in all the JSON documents. You can either have the user enter them manually, have the driver sample the available documents (perhaps scan through the first N documents), or some combination of them.

Once you have the complete set, you can then start mapping this to the relational model. A simple approach would be to flatten the data into columns, so each element in the array becomes a separate column. In that case, the data would be represented as:

id value receipt.items[0].value receipt.items[0].price receipt.items[1].value receipt.items[1].price
1 customer1 item1 5.55 item2 4.22
2 customer2 item3 3.45 <NULL> <NULL>

In the second row, there is no second element in the array so NULL values are used instead when returning the relational data. Similarly, if a field does not appear in one document but does in another, NULL will be substituted for where it is not present. There are other approaches for mapping this data, but I’ll leave that as an exercise for the reader.

A third and final example would be taking a key-value store and mapping that to a relational model. There are many different types of key-value stores, with many different subtleties, and it would make for a very long post if we were to consider all of them. Instead, let’s imagine a hypothetical key-value store which follows the example below:

In this case we have two “master” keys (key1 and key2), where key1 has two values (value1 and value2), and value2 also has a child (child2). key2 only has one value (value3) with no other children. A simple mapping of this would be to treat the keys as columns, their immediate values as columns, and any subsequent children as columns as well. Applying this to the above example would then result in the following mapping:

key firstLevel secondLevel
key1 value1 <NULL>
key1 value2 child2
key2 value3 <NULL>

Note that the column names have been synthesized and the types are not specified for this example. It would be easy to adapt the strategies employed in the previous example, which is also left as an exercise for the reader.

There are many other ways to represent this data relationally, and many other models of data which can be mapped. Over the years, we’ve seen many different kinds and successfully mapped all of them into sensible relational representations. Once this has been done, then all it takes is 5 days to have your read-only ODBC driver running and connecting to Tableau!