Friday, May 15, 2015

Change Data Capture (CDC) and Simple Recovery Mode

One of the misconceptions that some people have about change data capture (CDC) in SQL Server is that it needs the database to be set up with full recovery mode. If we look at the following TechNet article, it mentioned:

It should be noted that change data capture works with all recovery models. But when change data capture is enabled, operations that would normally be minimally logged in simple or bulk-logged recovery models are fully logged to enable change data capture to capture all changes.

So Change Data Capture can be enabled on database with simple recovery mode. This can be helpful as typically in some development environment, the databases are being set with simple recovery mode. One thing to be careful is that if we have a database with simple recovery mode and have CDC enabled, but it doesn’t have the CDC capture job running, as the database process transactions, the database’s transaction log would use more space. This is because those transactions will not get release until the CDC capture process those transactions. Let me try to demonstrate this.

Database in Simple Recovery Mode without CDC

As a base case, let us set up a database with simple recovery mode.

USE [master];
GO

IF EXISTS(SELECT 1 FROM sys.databases WHERE [name] = N'CdcTest')
BEGIN
    DROP DATABASE CdcTest;
END
GO

CREATE DATABASE CdcTest
ON (
    Name = 'CdcTest',
    FileName = 'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\CdcTest.mdf',
    Size = 10 MB,
    FileGrowth = 5 MB)
LOG ON (
    Name = 'CdcTest_Log',
    FileName = 'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\CdcTest_Log.ldf',
    Size = 50 MB,
    FileGrowth = 5 MB);

ALTER DATABASE CdcTest
SET RECOVERY SIMPLE;
GO

Now, let us create a table called “TableWithoutCDC”.

USE [CdcTest];
GO

CREATE TABLE dbo.TableWithoutCDC (
    Id INT PRIMARY KEY IDENTITY(1,1),
    Field1 VARCHAR(4000),
    Field2 VARCHAR(4000),
    Field3 VARCHAR(4000),
    Field4 VARCHAR(4000),
    Field5 VARCHAR(4000));

Now, try to insert some data into the table. Basically, the script below would do a one row insert in a transaction. It would do it 5,000 times.

USE [CdcTest];
GO

DECLARE @z AS INT;
SET @z = 1;

WHILE (@z <= 5)
BEGIN
    DECLARE @a AS INT;
    SET @a = 1

    WHILE (@a <= 1000)
    BEGIN
        DECLARE @Filler AS VARCHAR(4000);

        SET @Filler = REPLICATE('a', 4000);
        BEGIN TRANSACTION
        INSERT INTO dbo.TableWithoutCDC
        VALUES (@Filler, @Filler, @Filler, @Filler, @Filler)
        COMMIT TRANSACTION
        SET @a = @a + 1
    END

    CHECKPOINT;

    SET @z = @z + 1
END

Now, try to get the database’s transaction log space usage:

USE [CdcTest];
GO

SELECT    Name,
        [FileName],
        file_in_MB = (CAST(size AS FLOAT) * 8)/ 1024,
        file_used_in_MB = (CAST(FILEPROPERTY(name, 'SpaceUsed') AS FLOAT) * 8)/ 1024
FROM    sys.sysfiles

We got the following result:

image

The database’s transaction log file is using 0.8 MB of disk space.

Database in Simple Recovery Mode with CDC

Now, let see the impact of CDC, with the CDC capture job not running, on database's transaction log. First, let us reset the CdcTest database.

USE [master];
GO

IF EXISTS(SELECT 1 FROM sys.databases WHERE [name] = N'CdcTest')
BEGIN
    DROP DATABASE CdcTest;
END
GO

CREATE DATABASE CdcTest
ON (
    Name = 'CdcTest',
    FileName = 'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\CdcTest.mdf',
    Size = 10 MB,
    FileGrowth = 5 MB)
LOG ON (
    Name = 'CdcTest_Log',
    FileName = 'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\CdcTest_Log.ldf',
    Size = 50 MB,
    FileGrowth = 5 MB);

ALTER DATABASE CdcTest
SET RECOVERY SIMPLE;
GO

Then, create a table called "TableWithCDC".

USE [CdcTest];
GO

CREATE TABLE dbo.TableWithCDC (
    Id INT PRIMARY KEY IDENTITY(1,1),
    Field1 VARCHAR(4000),
    Field2 VARCHAR(4000),
    Field3 VARCHAR(4000),
    Field4 VARCHAR(4000),
Field5 VARCHAR(4000));

Now, we can try to enable CDC on CdcTest database and enable it to capture changes on the TableWithCDC.

USE [CdcTest];
GO

EXEC sp_cdc_enable_db;
EXEC sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'TableWithCDC', @capture_instance = 'dbo_TableWithCDC', @supports_net_changes = 1, @role_name = NULL;

By default, when CDC is enabled, it would create 2 SQL Agent jobs, one is to capture the changes and another one is to clean up the CDC data older than 3 days (default, but it can be modify). For this testing, if you have SQL Agent running, you might want to stop and disabled the CDC capture job. You can do that using the following script:

USE [msdb];
GO

EXEC dbo.sp_stop_job @job_name = 'cdc.CdcTest_capture';
EXEC dbo.sp_update_job @job_name = 'cdc.CdcTest_capture', @enabled = 0;

Now, similar to the first test, try to have some transactions in the database.

USE [CdcTest];
GO

DECLARE @z AS INT;
SET @z = 1;

WHILE (@z <= 5)
BEGIN
    DECLARE @a AS INT;
    SET @a = 1

    WHILE (@a <= 1000)
    BEGIN
        DECLARE @Filler AS VARCHAR(4000);

        SET @Filler = REPLICATE('a', 4000);
        BEGIN TRANSACTION
        INSERT INTO dbo.TableWithCDC
        VALUES (@Filler, @Filler, @Filler, @Filler, @Filler)
        COMMIT TRANSACTION
        SET @a = @a + 1
    END

    CHECKPOINT;

    SET @z = @z + 1
END

Now, let we see how it impacts the database's transaction log usage.

USE [CdcTest];
GO

SELECT    Name,
        [FileName],
        file_in_MB = (CAST(size AS FLOAT) * 8)/ 1024,
        file_used_in_MB = (CAST(FILEPROPERTY(name, 'SpaceUsed') AS FLOAT) * 8)/ 1024
FROM    sys.sysfiles

The database's transaction log is now using about 116 MB of disk space, much higher than the 0.8 MB of disk space in the first test. If we look at the log_reuse_wait_desc column on sys.databases table for CdcTest database:

SELECT  [name], log_reuse_wait_desc
FROM sys.databases
WHERE [name] = 'CdcTest'

The reason that the database could not reuse the log space is because "replication", which actually make sense since under the hood CDC is using the same technology as replication. Basically, the database will not release the transactions in the transaction log until they are processed by the CDC capture job/process.

Now, let's try to process the CDC changes, do a checkpoint and check the database's transaction log space usage:

USE [CdcTest];
GO

EXEC sp_cdc_scan;
GO

CHECKPOINT;
GO

SELECT    Name,
        [FileName],
        file_in_MB = (CAST(size AS FLOAT) * 8)/ 1024,
        file_used_in_MB = (CAST(FILEPROPERTY(name, 'SpaceUsed') AS FLOAT) * 8)/ 1024
FROM    sys.sysfiles

The database's transaction log disk usage immediate drops to about 4 MB.

Additional thing to note is that, on the above example, we did those transactions on a table that is being tracked with CDC. This would not happen if the transactions are run against any table that is not being tracked by CDC on CDC enabled database. The database transaction log will reuse the log space after a checkpoint. The following script can help to illustrate this:

USE [CdcTest];
GO

--- Create a table that is not being tracked by CDC within CDC enabled database.
CREATE TABLE dbo.TableWithoutCDC (
    Id INT PRIMARY KEY IDENTITY(1,1),
    Field1 VARCHAR(4000),
    Field2 VARCHAR(4000),
    Field3 VARCHAR(4000),
    Field4 VARCHAR(4000),
    Field5 VARCHAR(4000));
GO

--- Run transactions against the TableWithoutCDC
DECLARE @z AS INT;
SET @z = 1;

WHILE (@z <= 5)
BEGIN
    DECLARE @a AS INT;
    SET @a = 1

    WHILE (@a <= 1000)
    BEGIN
        DECLARE @Filler AS VARCHAR(4000);

        SET @Filler = REPLICATE('a', 4000);
        BEGIN TRANSACTION
        INSERT INTO dbo.TableWithoutCDC
        VALUES (@Filler, @Filler, @Filler, @Filler, @Filler)
        COMMIT TRANSACTION
        SET @a = @a + 1
    END

    CHECKPOINT;

    SET @z = @z + 1
END
GO

--- Now get the database transaction log disk usage
SELECT    Name,
        [FileName],
        file_in_MB = (CAST(size AS FLOAT) * 8)/ 1024,
        file_used_in_MB = (CAST(FILEPROPERTY(name, 'SpaceUsed') AS FLOAT) * 8)/ 1024
FROM    sys.sysfiles

The database will report that the transaction log is using around 2 MB of space (the number might vary but it should be a small number).

Clean Up

You can use the following script to clean:

USE [master];
GO

IF EXISTS(SELECT 1 FROM sys.databases WHERE [name] = N'CdcTest')
BEGIN
    DROP DATABASE CdcTest;
END
GO

Take Away

Some take away points:

  • CDC can be enabled on database with any recovery models, including database with simple recovery mode.
  • Make sure that the CDC capture job is running, as transactions against CDC tracked table(s) will be preserved in the transaction log until they are processed by CDC capture job/process. This is especially true if you restore CDC enabled database with KEEP_CDC option, as the CDC capture and clean up jobs would not get created by default
  • If you have CDC tracked tables and you don't have the CDC capture job running, you can potentially see your database's transaction log grow even if it is in a simple recovery mode.

2 comments:

  1. is it possible to implement log shipping on a database with cdc enabled to a standard edition secondary server? thanks!

    ReplyDelete
    Replies
    1. It is an interesting question. Obviously, CDC is available in Enterprise version of SQL Server (it is also available in Evaluation and Developer version of SQL Server). If we are looking to do log shipping and maintain CDC data to Standard Edition of secondary server, that would not work/ supported.

      But how about if we are looking to just do the log shipping without the CDC data? Supposedly according to Microsoft documentation (https://msdn.microsoft.com/en-us/library/cc645938.aspx#RestoreOrAttach), if we are to restore the database without the keep_cdc option, the CDC would be disabled. Technically, we could restore database without CDC to Standard Edition of SQL Server.

      But when I tested this out, unfortunately, it does not seems to work. I was able to restore the database with norecovery option just fine (I used Express Edition for testing). But whenever I tried to bring the database online using recovery option or standby option (to make it readable), I would get error message "SQL Server Cannot load database "xxx" because Change Data Capture is enabled. The currently installed edition of SQL Server does not support Change Data Capture. Either disable Change Data Capture in the database by using a supported edition of SQL Server, or upgrade the instance to one that supports Change Data Capture."

      The only way that I can bring it online is if I disable the CDC on the primary server using sys.sp_cdc_disable_db, do another transaction log on the primary server and then apply that one to the secondary instance and bring the database online on the secondary server. But that basically you eliminate CDC on the primary server as well.

      Delete