For more than three decades, Microsoft SQL Server has helped countless organizations store and manage their enterprise data, and it’s still one of the most widely used software applications on the planet. According to the DB-Engines database ranking, SQL Server remains the third most popular database management system, just behind Oracle and MySQL.

Change data capture (CDC) is essential functionality for many businesses, especially those with real-time ETL use cases. But what is change data capture in SQL Server, and how does it work? Keep reading for the answers and more.

undefined

Enjoying This Article?

Receive great content weekly with the Xplenty Newsletter!

Octopus

Table of Contents

What is Change Data Capture in SQL Server?

These days, one of the best practices for data integration is to move data into a centralized repository, such as a data warehouse or data lake. Maintaining a single unified repository makes it easier to keep a “single version of the truth” and perform business intelligence (BI) and analytics.

Change data capture (CDC) refers to a number of techniques for detecting when information from a particular data source has been added or changed. Thanks to CDC, users only need to migrate the data that actually requires an update, rather than the entire database. This makes ETL and data integration processes dramatically faster and more efficient, especially during the resource-intensive transformation stage of ETL. Change data capture also has several other use cases (e.g., as a paper trail for database audits).

In Microsoft SQL Server, change data capture is log-based. That is, CDC works by examining a separate file known as the transaction log, which records every transaction made in the database (e.g. INSERT, UPDATE, and DELETE operations). Although the transaction log primarily aims to help the database recover after a crash, it also serves the purpose of enabling CDC.

Starting with the original database, the CDC tool reads the transaction log and then applies each operation to the change table, a separate database table that mirrors the original for capturing CDC information. Then, the CDC tool can filter for changes made over a specified time period, using this information to update the master copy in the centralized data repository.

Integrate Your Data Today!

Try Xplenty free for 14 days. No credit card required.

Octopus

How Change Data Capture Works in SQL Server

We’ve gone over the general idea of change data capture in SQL Server — now let’s get down to the technical nitty-gritty.

Before you can track any changes to a SQL Server database, a sysadmin must enable CDC with the procedure sys.sp_cdc_enable_db, which enables change data capture for the current database. The procedure sys.sp_cdc_enable_table, meanwhile, enables CDC for a specific table in the database. You can also check to see if CDC has already been enabled by querying the column is_cdc_enabled in the sys.databases catalog view.

Conversely, the procedures sys.sp_cdc_disable_db and sys.sp_cdc_disable_table disable CDC for the given database or table, respectively. This will auto-delete all associated CDC metadata for that database or table.

In a SQL Server change table, the first five columns are metadata that contain additional information about each change. These columns are as follows:

  • __$start_lsn: the start of the commit log sequence number (LSN) for a particular change.
  • __$end_lsn: the end of the commit LSN for a particular change.
  • __$seqval: the sequence order of a particular change within a multi-change transaction.
  • __$operation: the type of change performed. 1 represents a delete operation, 2 represents an insert operation, 3 represents an update operation before the change, and 4 represents an update operation after the change.
  • __$update_mask: a bitmask wherein each bit designates a possible change to a given column. When the change is a delete or insert operation, all columns have their bits set to 1. When the change is an update operation, only those columns updated have their bits set to 1.

SQL Server change tables need to be cleaned up periodically in order to prevent them from growing out of control. If you have already executed an ETL process on the changed data, for example, you may wish to flush the contents of the change table so that you do not congest it with out-of-date information. The default period for retaining SQL Server change table information is three days. 

We’ll conclude with a few miscellaneous notes about the availability of change data capture in SQL Server:

  • CDC is available for all Linux users starting with SQL Server 2019, or with SQL Server 2017 if Cumulative Update 18 (CU18) is installed. (CDC has been available for Windows users since SQL Server 2008.)
  • CDC is not available for database tables that have a clustered columnstore index, although users have been able to perform CDC on tables with a non-clustered columnstore index since the release of SQL Server 2016.
  • CDC is not available for the Basic, S0, S1, or S2 service tiers in Azure SQL Database, which is Microsoft’s cloud version of SQL Server. You can only use it with the Standard 3 (S3) tier or above.

How Xplenty Can Help with Change Data Capture

Enjoying This Article?

Receive great content weekly with the Xplenty Newsletter!

Octopus

As we’ve shown in this article, change data capture in SQL Server is a complex, technically involved endeavor. Working with the SQL Server change data capture paradigm will require you to be intimately familiar and comfortable with the SQL query language, as well as the ins and outs of the Microsoft SQL Server platform.

However, CDC remains essential for many use cases that depend on efficient ETL and data integration — so how can you reconcile these two issues? The good news is that there’s an easier approach to SQL Server change data capture, and it's right under your nose.

For many businesses, the answer is to leverage a powerful automated ETL tool like Xplenty. The Xplenty platform facilitates rapid, efficient data integration from your data sources to your data warehouse or data lake in the Cloud. Xplenty includes features such as log-based CDC functionality; a no-code, drag-and-drop user interface; and over 100 pre-built connectors — including SQL Server.

Creating a connection in Xplenty to your SQL Server database is simple. Once you’ve allowed Xplenty access to SQL Server and the connection is complete, you can both read from and write to SQL Server database tables, as well as perform change data capture to streamline your ETL workflow.

Related Reading: Allowing Xplenty Access to Microsoft SQL Server Databases

Thanks to Xplenty, it’s never been easier to perform change data capture and build robust ETL data pipelines. Want to learn how Xplenty can help with your SQL Server change data capture needs? Schedule a call today with our team of data experts to discuss your situation, or to start your 14-day demo of the Xplenty platform.