Optimization of ODBC Data Retrieval with the SimbaEngine SDK

ODBC driver performance is an important concern, particularly when you’re using ODBC connectivity with popular BI tools such as Tableau. The SimbaEngine SDK has many ways to optimize data retrieval performance, and with version 9.5 of the SDK we added another internal API to further optimize cell retrieval from your data-store.

To see how the new API works, you must first understand how data is retrieved by default from the data store interface (DSI) which contains the small amount of custom code necessary to talk to the data-store. When an application requests a row of data from the ODBC driver, the SDK will convert this into a series of calls to move, row by row, through the result set in the data-store. For each row, data is requested from each relevant cell one by one as well. If an application is only requesting a single row at a time, this works perfectly. However, if an application is requesting many rows at a time then there is overhead associated with this way of retrieving data. For example, if an application requests 1000 rows with five columns, this would result in one function call to move to each row, plus five function calls to retrieve each cell in the row for a total of 6000 function calls.

This overhead is what our new BulkFetch internal API aims to remove. Instead of retrieving data row by row, cell by cell, data is retrieved from multiple columns and multiple rows in one call. Note that the SDK will only attempt to use this new API if your DSI says that it supports it, and the application attempts to fetch more than one row at a time.

When using the BulkFetch API, virtual function calls are minimized, and the DSI has the opportunity to parallelize the retrieval and processing of multiple columns at a time. For instance, if the SDK is requesting 1000 rows from three columns at a time, the DSI can use three threads to process each column independently from the others, and reap the rewards of modern multi-core architecture performance. In this case, if an application requests 1000 rows with five columns, this would result in one function call to retrieve all 1000 columns with five columns.

To demonstrate the advantage of using this new API, I put together a small test to illustrate the performance improvement. I used the UltraLight sample that comes with the SDK as the base driver for my test. UltraLight is a simple ODBC driver which hard-codes into memory seven rows with three columns each and returns this data-set if the submitted query contains “SELECT” within it. I modified this driver to return ten million records by repeatedly iterating over these seven rows. I then implemented the BulkFetch() and IsBulkFetchSupported() functions so that the driver could make use of the new BulkFetch optimization. Note that I did not use multiple threads to process each column in parallel to keep the test simple, but this would be an additional optimization to make.

Using a simple application which submits a query and uses SQLBindCol to bind result columns, and will then retrieve a configurable number of rows (the fetch size) at a time, I measured the average of five runs with a number of different fetch sizes. I’ve also included, for reference, the time taken when not binding columns and instead retrieving each cell using SQLGetData. All times are measured in seconds.

Fetch Size Bulk Fetch Enabled Bulk Fetch Disabled
SQLGetData 16.4866 16.5696
1 5.9888 5.9338
25 2.3462 3.1548
50 2.1870 3.1138
100 2.0744 3.0546
250 2.0284 3.0390
350 2.0218 3.0534

As expected, the time to retrieve data using SQLGetData and with a fetch size of 1 is essentially the same between the two configurations, however when the fetch size is increased to 25 the Bulk Fetch optimization shows a 25.6% improvement, with the gap widening as the fetch size is increased as shown in the below graph.

log chart from KyleP blog post

As the number of columns involved and the number of rows requested increases, the performance gain you’ll see using the BulkFetch API will also continue to increase.

I encourage you to try this out yourself by downloading the SimbaEngine SDK here, or if you’re already a customer by looking at the reference documentation to see how you can use these functions in your own implementation: http://www.simba.com/docs/SDK/SimbaEngine_C++_API_Reference/a00334.html