Couchbase Server is a fast and scalable NoSQL solution. It can reduce the cost and complexity of building technology stack that scales from a mobile device to a data center supporting millions of users. But unless you are a startup building a technology stack from scratch, implementing Couchbase will require integration with existing infrastructure.
It is common for a company to implement Couchbase Server to address a particular need or pain point and gradually gain required expertise and confidence before transitioning other parts of the stack to the Couchbase platform. One key success factor? The ease with which Couchbase can “play nice” with other data sources and data consumers.
In this blog post, I will walk you through the steps of setting up Couchbase as a linked server in Microsoft SQL Server. This may be used if you need to move data between Couchbase and SQL server, or run distributed queries across heterogeneous data sources which include Couchbase Server.
Linked server is just one example of using the Simba Technologies ODBC driver with SQL connector as a bridge between non-relational data source like Couchbase and an ODBC compatible application. The same driver can be used to access data in Couchbase from other ODBC/JDBC applications like Tableau, Excel, SAP Lumira, Power BI, and many others directly, without setting it up as a linked server.
Simba ODBC drivers allow you to use standard ANSI SQL to query your data in Couchbase as if it were a relational data source by mapping JSON-based documents to relational schema. The driver can map individual document types in a Couchbase bucket to virtual tables and further customize how nested documents are re-normalized so that they look like traditional SQL tables.
For power users, the Simba ODBC driver allows you to mix ANSI SQL with Couchbase’s own N1QL, a query language which includes additional features for working with document-oriented datasets.
Install the Simba Couchbase ODBC Driver
If you haven’t done so yet, download an evaluation version of the Couchbase ODBC driver for Windows.
The download package will contain two installers. You will need to install a 32-bit or a 64-bit version of the driver depending on the architecture of MS SQL Server on which the linked Couchbase Server will be setup.
You can check the CPU architecture of your SQL Server by running the following command in SQL Server Management Studio (SSMS):
select @@version ------------------------------------------------ Microsoft SQL Server 2012 - 11.0.2100.60 (X64) Feb 10 2012 19:39:15 Copyright (c) Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.1 (Build 7601: Service Pack 1) (Hypervisor)
Here I am running a 64-bit Developer Edition of the SQL Server and therefore will need to install a 64-bit version of the driver.
Next, run the installer, accept the licensing agreement, and complete the driver installation. After you download the driver, you will receive an email from Simba with the license file needed to activate evaluation. Copy this license file into the lib/ subfolder of the folder where you installed the driver. By default, it goes into the “Simba Technologies Inc” folder under “Program Files” or “Program Files (x86)”.
Now the driver is installed and you can proceed to configuring connection to our data source.
Configure the ODBC Data Source
You’ll need to know the CPU architecture of SQL Server once again to setup the data source connection. Data source connections are created using ODBC Data Source Administrator which comes in two version, 32 and 64 bit, and needs to match the version of the driver that you are using.
This is where things can get a bit confusing: Both versions of the Data Source Administrators are named
odbcad32.exe. Only one lives in
%windir/system32% (it is a 64-bit version, despite the 32 in its name), while the other (32-bit) in
%windir%/SysWOW64. (Just disregard that 64 in its name, it is a 32-bit administrator!)
If you installed the driver but don’t see it in the Data Source Administrator’s Drivers tab, then most likely you are using an incompatible version of Data Source Administrator. You can use Task Manager to check if you are running a 64-bit version (displayed as
odbcad32.exe in the process list) or a 32-bit one (
Now that you have a proper version of ODBC Data Source Administrator running, you can configure your Data Source Name (DSN).
For this demo it doesn’t matter if you create a User or System DSN, but if you need to share your DSN with other users on the same machine, you need to create a System DSN.
Click the “Add…” button on the User or System DSN tab, select Simba Couchbase ODBC Driver from the list of drivers and click “Finish”.
You will be presented with the DSN Setup dialog. At the very least you need to provide the Data Source Name, and Server. You can either leave the rest of the fields blank or use default values. Click the “Test…” button to verify that you can establish connection with your Couchbase server.
- Enter Data Source Name. This is the name that you will use to setup linked server or to connect to it from other ODBC compliant applications. You can call it whatever you like, but do yourself a favour and avoid using spaces in the name. Also add a “32” or “64” suffix to distinguish which driver architecture was used to create this connection.
- Enter Server name where your Couchbase Server is hosted. In this demo I am using a local Couchbase Server running on my laptop. You can download it from the Couchbase website.
- Select Authentication Mechanism supported by your Couchbase installation. Because I am accessing the local system, I am using the “No Authentication” mechanism. Depending on your Couchbase configuration you may need to use a credential file. Please refer to the driver User Manual for details.
- Click the Test button to verify that you can connect to your Couchbase instance.
- If the test has passed, click the OK button to complete creation of the data source. You are ready to use it from ODBC applications or continue to setting up linked server.
Troubleshooting – What to do if your connection test failed:
- Check the error message in the Test Results dialog. It may provide hints of what is going on.
- Verify that you entered all connection information correctly. Did you make a typo or leave out a field that needs to be configured or changed from default?
- Is your Couchbase Server running and listening for incoming connections on the port you are trying to connect to?
- Is a firewall blocking network connections?
- Can you connect to your Couchbase server from a web browser (e.g. using on port 8091) or via the ‘cbq’ command line utility?
- Check Couchbase logs for the hints on why your connections may be failing.
Re-normalize the Schema
Now we have the working ODBC Data Source which could be used to setup a linked server or to query Couchbase Server from an ODBC compliant application such as Excel. But depending on the complexity of your Couchbase document structure, it may not be good enough. It’s common to have documents of different types with different attributes in the same bucket. The driver will sample a number of document from the bucket and determine the column names in the virtual table representing this bucket to the ODBC compliant application. If the number of documents of different types is large, there’s a good chance that some document types will not be included in the sample and the attributes from those documents will not be mapped to the columns of the virtual table. To solve this problem, the Simba ODBC driver allows you to configure the name of the attribute that determines the document type—a type differentiator. If type differentiator is provided in the advanced options of the DSN, then the driver will find the list of distinct document types and will represent each document type in it’s own virtual table with document attributes mapped to the columns of the virtual table.
Let’s have a look at the `travel-sample` dataset which comes with the Couchbase Server. Without the type differentiator attribute configured, the dataset will be presented as a single table containing all columns that the driver found during sampling. Meanwhile, the dataset contains distinct document types, such as airport, airline, landmark and route, each with its own set of attributes.
In order to map each type of the document to an individual table, we need to let the driver know that the document type is differentiated by the attribute named “type”. To do this, open the DSN you’ve just created in ODBC Data Source Administrator, and click the “Advanced…” button in DSN configuration and enter
in the “Type Name List” field.
The Simba Couchbase ODBC Driver comes with the Installation and Configuration Guide and Quick Start Guide which are available in the same folder where you installed the driver. The Guides has detailed information on driver installation and documents all available configuration options including Advanced Options and Schema Editor.
Set up a Linked Server
Now start SQL Server Management Studio and connect to the SQL Server on which you want to create the Couchbase linked server.
Expand “Server Objects” folder and right click on “Linked Server” subfolder. Select “New Linked Server…” option.
On the General tab, enter the linked server name (it could be any name that will be used to refer to this server in your queries e.g. “CB64”), select “Microsoft OLE DB Provider for ODBC Drivers” from the “Provider” dropdown list and in the Data Source field enter the name of the DSN you created in the previous step (e.g. “CB64”). All other fields can be left blank.
Click Security tab and make sure that the “Be made without using security context” option is selected. If your Couchbase Server requires authentication, this will need to be changed to either using login’s current security context or using fixed credentials for any user (the last option). Click OK to create the linked server.
You can also create linked server and associated login by running the following two queries:
EXEC master.dbo.sp_addlinkedserver @server = N'CB64',@srvproduct=N'Couchbase',@provider=N'MSDASQL',@datasrc=N'CB64' EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'CB64',@useself=N'False',@locallogin=NULL,@rmtuser=NULL,@rmtpassword=NULL
Refresh the list of Linked Servers in the Server Object Explorer and verify that you can read the catalog of the linked server:
Now you can issue queries against the linked Couchbase Server as if it was a relational data source, e.g.:
select * from openquery(cb64, 'select top 10 al.name, round(sum(distance),0) as route_distance from cbdefault.[travel-sample].route_schedule rs join cbdefault.[travel-sample].route r on r.pk = rs.pk join cbdefault.[travel-sample].airline al on al.pk = r.airlineid group by al.name order by 2 desc ')
Don’t Drink Too Much Kool-Aid
The Couchbase ODBC Driver works hard to map the schema-less document-oriented data storage of Couchbase Server to SQL objects like databases, tables, and columns, but it’s not a substitute for understanding your data and how distributed queries processed by SQL Server.
Auto-generated schema should work most of the time, or at least provide a sensible starting point. But depending on the complexity and volume of your data, auto sampling may miss or incorrectly determine the type of a document attribute. You can always adjust auto-generated schema using the bundled Schema Editor.
SQL Server imposes a limit on the column width for linked servers (8000 bytes max). Couchbase doesn’t have this restriction and it’s quite possible that certain columns will not be readable through the linked server. In fact, this occurs with the `beer-sample` dataset which comes with the server. Working with Couchbase directly via the ODBC driver without setting up the linked server doesn’t impose this limitation.
You should also keep in mind that in order to be visible to the driver buckets, you need to have a primary index defined on them e.g. via cbq command line tool:
CREATE PRIMARY INDEX ON `travel-sample`;
Linked server splits query-processing between the server on which the linked server is defined (and on which you issue your queries), the driver’s SQL engine and the linked Couchbase server. Both SQL Server Query Optimizer and the ODBC driver’s SQL Engine make decisions about where particular parts of your query will be processed (Couchbase Server, the driver or the SQL Server). For the best performance, the query needs to be processed closer to the data source, but that does not always happen by default. To force SQL Server process the query on the linked server, use SELECT from OPENQUERY syntax as opposed to querying linked tables directly. See References and Further Reading for an article discussing common pitfalls with linked server performance.
In this blog post I demonstrated how to setup Couchbase as a linked server using the Simba ODBC driver. Though I focused on a linked server scenario, you can use similar steps to access Couchbase Server via Simba ODBC driver from other ODBC-compliant applications like Tableau, Excel, SAP Lumira or Microsoft Power BI.
This post covers basic configuration of the Couchbase ODBC driver. I didn’t cover advanced configuration options, type differentiators and the Schema Editor application, which allow better mapping of Couchbase document structure to renormalized tables and simplify queries. For more information on advanced configuration, please refer to the driver and Schema Editor documentation included in the download package or from Simba’s website.
So if you need to integrate Couchbase Server with the rest of your infrastructure consider using Simba ODBC and JDBC drivers to simplify your integration.