Saturday, August 17, 2019

Tracking Data Changes - Change Tracking, Part 2

In my last post, I showed you how to enable, configure, and disable change tracking at the database level. Very good, as far as that goes, but you're not going to get very far without tracking some actual data changes!

To recap, enabling change tracking at the database level is very simple:
ALTER DATABASE <database name here> 
SET CHANGE_TRACKING = <ON | OFF>
(CHANGE_RETENTION = <positive integer> <DAYS | HOURS | MINUTES>, AUTO_CLEANUP = <ON | OFF>);

(Note on the code snippets: I'm using Markdown for this post to simplify my blogging workflow; please let me know if you'd rather I keep using Gists for code samples)

If you don't provide a retention period, SQL Server's default is 2 days. Auto-cleanup defaults to ON unless you tell it otherwise.

Easy!

The table level commands aren't any more complicated. Before we get started, please note that change tracking requires a primary key on the table you want to track. This is reasonable - you need some kind of unique identifier to tell you which row has changed.

With the PK requirement in mind, on we go!

Much as enabling change tracking at the database level uses a simple ALTER DATABASE command, enabling change tracking at the table level uses a simple ALTER TABLE command:
ALTER TABLE Demo_CT
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON);

There's not a lot going on here either. You have three things to specify:
  1. The table you want to track changes on
  2. Enabling or (spoiler alert!) disabling change tracking
  3. Whether you want to track which column(s) were updated or not
As you may have guessed, if you want to disable change tracking, just change ENABLE to DISABLE and omit the WITH clause.

Speaking of that WITH, if you don't specify a setting for TRACK_COLUMNS_UPDATED, SQL Server will default it to OFF in order to reduce space requirements for change tracking.

While we're talking about TRACK_COLUMNS_UPDATED, it does just what it says on the tin. This is a good time to remind you that unlike other methods of tracking changes, all this does is tell you that a given record was updated and which columns were updated - it does not give you any state data. We'll cover this in more detail in a future post.

A word of warning: you can't modify the TRACK_COLUMNS_UPDATED setting without disabling and re-enabling change tracking on the table.

To see which tables have change tracking enabled, query the catalog view sys.change_tracking_tables:

As with sys.change_tracking_databases, the information is fairly basic. Kendra Little's fantastic article I referenced in my last post has a great script for this, too:
-- Kendra Little made this too; it's also posted at https://www.brentozar.com/archive/2014/06/performance-tuning-sql-server-change-tracking/
SELECT sc.name AS tracked_schema_name,
    so.name AS tracked_table_name,
    ctt.is_track_columns_updated_on,
    ctt.begin_version /*when CT was enabled, or table was truncated */,
    ctt.min_valid_version /*syncing applications should only expect data on or after this version */ ,
    ctt.cleanup_version /*cleanup may have removed data up to this version */
FROM sys.change_tracking_tables AS ctt
JOIN sys.objects AS so 
    ON ctt.[object_id]=so.[object_id]
JOIN sys.schemas AS sc 
    ON so.schema_id=sc.schema_id;

Of course, feel free to write your own if you need different information!

For our simple example, it gives us:

Joining on sys.schemas and sys.objects, Kendra's script yet again gives us human-readable results. I can't emphasize enough how important it is to script for readability.

As I mentioned before, disabling change tracking at the table level is equally simple:
ALTER TABLE Demo_CT
DISABLE CHANGE_TRACKING;

Don't forget - if you wish to disable change tracking at the database level, you'll need to disable change tracking for all tables first.

Thank you for continuing to read this series! Please join me next time when I show you around working with change tracking!

Sunday, August 11, 2019

Tracking Data Changes - Change Tracking, Part 1

Let's begin our adventure into tracking data changes in SQL Server with change tracking. Change tracking is a simple tool in SQL Server that does exactly what it says on the tin: tracks changes.

Microsoft provides excellent documentation on change tracking here. Microsoft has done a good job of retaining a lot of documentation for versions of SQL Server prior to 2014, but there are some gaps. Change tracking happily hasn't changed much since it was introduced in SQL Server 2008, so most of this information is valid for SQL Server 2012, 2008R2, and 2008.

SQL Server 2014 SP2 does introduce a documented stored procedure that you can invoke to clean up the internal change table (and don't worry, we'll cover this in due course!), but that's about it as far as documented functional changes go (note: if any of you hardcore internals folks know of under-the-hood changes between versions, even if they work the same on the surface, please let me know! I love that kind of detail!). That said, there's always the chance that future versions will introduce differences, so keep that in mind.

Anyway, moving forward! Here is a quick overview of change tracking:

  • The change source is an in-memory rowstore, flushed to disk with each checkpoint
  • It answers two questions:
    • Has a row changed?
    • What rows have changed?
  • The only values from the source table stored in the change table are the primary key values for the changed rows
  • You have to use the built-in functions to get at the change data
  • The database you want to track changes in must have a compatibility level of at least 90
    • Filed under 'ways the database engine will let you score an own goal', SQL Server will still let you enable change tracking on a database if the compatibility level is lower than 90, but the functions used to retrieve change data will give you an error

Change tracking is a fairly simple tool to set up and use. In order to enable change tracking, you must first enable it at the database level via an ALTER DATABASE statement, and then at the table level via an ALTER TABLE statement.

Let's begin at the database level. The ALTER DATABASE statement looks something like...

There aren't a lot of parameters here. You can set change tracking on or off, you can specify your retention period, and you can specify whether to enable auto-cleanup or not.

For the retention period, you have the choice of DAYS, HOURS, or MINUTES. If you don't specify retention, SQL Server will configure change tracking with a retention period of 2 days. If you provide a number but don't specify the interval, SQL Server will default to days. The minimum retention period is 1 minute.

For auto-cleanup, if you specify OFF SQL Server will not automatically clean up change tracking data. If you don't specify auto-cleanup, SQL Server will default to ON. Unless you want to be responsible for cleaning up after change tracking, ON is your best bet.

To see which databases have change tracking enabled, and to look at their configuration, query the sys.change_tracking_databases catalog view:



It covers the basics, but it could be more informative. Kendra Little has a nicer script:

For our simple example, it gives us:

By joining on sys.databases to provide the database name and omitting the retention_period_units in favor of the description, Kendra's script gives us much more readable results. You'll notice that this script omits the max_cleanup_version column. We'll come back to that, and when it's useful, in a future post.

As an aside, while it can sometimes be a pain to script for readability, it almost always pays off in the long run.

Disabling change tracking is similarly simple. As with enabling change tracking, you disable it with a very straightforward ALTER DATABASE command:

You will need to disable change tracking on all tables before disabling it at the database level, but as we haven't looked at the table level commands yet we'll just pretend we've already done that.

You can change the database-level settings for change tracking with a similar ALTER DATABASE script. Just omit the = ON or OFF:

You can modify either or both the retention period and the auto-cleanup setting.

That wraps it up for this post! Join me next time when we look at enabling change tracking at the table level!

Sunday, July 14, 2019

The best laid plans etc.

Hi there! I apologize for this blog going quiet again. Major project at work, all that, and this blog slipped down lower on my list of priorities outside of work, past spending time with my kids and my partner, exercise, rest, and all that.

Things are wrapping up, so look for a new post about tracking data changes soon!

Sunday, April 21, 2019

A brief note on version...

With Microsoft's move away from the Service Pack model, SQL Server has seen more frequent updates. This is generally a good thing, but it does pose a problem for those of us who write about it. Information gets out of date. New functionality comes in, old functionality is deprecated, documentation for older versions disappears, and the great cycle of technology continues. Many of us (like me!) write these blogs in our free time, to educate the community and ourselves, and it can be difficult to go back and maintain older posts to ensure they're up-to-date.

I have both Windows and Linux versions of SQL Server (at the time of this writing, my lab VMs are SQL Server 2017 CU14 on Windows 10 and Ubuntu Server 18.04 LTS; I have spun up a container or twelve in my time as well, and of course there's Azure SQL Database, and...), and I keep them up-to-date as new cumulative updates are released. I'll do my best to post version information as I write, but I probably won't go back and update old posts unless there's a really big change. Even then, I'll probably write a new post instead.

If you need historical version information, go directly to Microsoft SQL Server Versions List, do not pass GO, do not collect $200. Maybe toss 'em a couple bucks. You can also now find documentation for previous versions of SQL Server on the SQL Server previous versions documentation page.

Tracking Data Changes - Introduction

Unless you're managing a static archive database, the data in your databases will change. Odds are, at some point you're going to need to know about these changes for one reason or another. Perhaps you're using a data warehouse and need to retain history, or you need to keep track of slowly-changing dimensions. Perhaps you have an application that needs the latest data, and you want an easy way to feed it without querying the entire table for timestamps or update flags. Perhaps you have audit requirements and need to keep detailed change history.

Whatever the reasons, SQL Server has a few options to satisfy various use cases. Your options range from the 'roll up your sleeves and do it yourself' triggers, to Change Tracking, Change Data Capture, and Temporal Tables.  In this series, I'll show you these options and how to use them. We'll begin with Change Tracking, then move on to Change Data Capture, Temporal Tables, and conclude with triggers.

If you're using a platform other than SQL Server, please refer back to your RDBMS's documentation (or - let's be honest here - StackOverflow followed by your search engine of choice) for information about tracking data changes. Many platforms have similar functionality to handle this need, but if nothing else triggers are almost always a viable option, if a bit more manual work than we might like.

In SQL Server, some of the functionality depends on your version and edition. The following table provides a quick summary:

Tool Debut Edition
Change Tracking SQL Server 2008 All
Change Data Capture SQL Server 2008 Standard as of SQL Server 2016 SP1, Enterprise prior
Temporal Tables SQL Server 2016 All
Triggers Back in the dim mists All

Here is a (very) quick overview of the various options:
Change Tracking:
  • Uses an internal versioning mechanism to track changes
  • Just tells you that a change happened (it can tell you WHICH column was changed) - no history
  • Snapshot isolation, so be mindful of TempDB
  • Reasonably tolerant of schema changes
  • Requires a primary key
  • Automatically cleans up after itself
Change Data Capture:
  • Uses a log reader agent (hello sp_replcmds!) to capture change data
  • Can give you full change history
  • Does not play nicely with schema changes 
  • Allows two capture instances per table
  • Highly configurable - capture only columns you want
  • Automatically cleans up after itself
Temporal Tables:
  • Also requires a primary key
  • Also gives you the full change history
  • Uses period columns in the source table to feed the history table
  • Easy to query
  • In SQL Server 2017, introduced the capability to clean up after itself
Triggers:
  • Offer you amazing flexibility in designing your tracking
  • Let you capture metadata and stuff the other solutions may not allow
  • Require you to manage everything from soup to nuts
There is a lot more about each of these methods, of course, and in the course of this series I'll walk you through each in some detail. Please check back soon for an introduction to Change Tracking!

Monday, July 16, 2018

A quick 'save me from myself' thing in SSMS

Have you ever felt that 'down-elevator' feeling in the pit of your stomach when you realize you just executed a script against the wrong instance? I have.

It's of course possible to add name checks and all to only allow execution on a given instance, I'm going to go out on a limb here and guess that most people don't do that in most cases.

Happily, there's a quick and easy setting you can take advantage of to give yourself a visual indication that hey, maybe you're not where you thought you were: the 'Use custom color:' setting in the Connection Properties of the Connect to Server dialog box.

To get there, first click Options:



Check the box for 'Use custom color:', and hit 'Select...':




Pick your color, click 'OK' (I tend to use red for production, orange for QA/test, green for development, but go with whatever works best for you), and then click 'Connect':




From here on out, the status bar down at the bottom of the query window will be whatever color you've picked for that instance:




While various third-party tools offer similar functionality (often with a lot more flexibility and configurability), this is built-in to SSMS and has been around for quite a while. It's a great quick gut-check!

Tuesday, July 10, 2018

Updates and housekeeping


After... well, forever... on an old theme, I've updated. Some older posts may look weird, but most should be good. Please let me know if you run into any problems.

The code I can't live without



Today's T-SQL Tuesday topic is thought-provoking: What code couldn't you live without? (OK, OK, technically it's just titled 'Code You Would Hate to Live Without', but work with me here.)

One of the hallmarks of the dedicated DBA is that we write our own tools. I've encouraged juniors to write their own code, even if they can find things online that solve their problems (inexperienced folks blindly using code from online is a whole can of worms that I may open later). We gain so much by doing the work to learn about the systems we're interacting with, and the harder work of figuring out how to write the code, be it T-SQL, Powershell, or what have you, to really dig in there.

As a brief aside, in my experience if you don't have the curiosity and drive to begin writing your own tools, you're not going to get very far as a DBA. I'm not sure I'd WANT you to be in charge of my servers. Now, not for a second would I expect highly-polished code from juniors right out of the gate. What matters more is evidence of that curiosity and drive. What matters is that we keep learning, keep practicing, and keep improving.

On the flip side, there are times we find something out there that Just Works. Part of the wisdom we develop as we progress is knowing when to press on with writing our own, and when someone else's code does the job perfectly.

I'm finished digressing - on to the code I'd hate to live without!

Under the 'code from others' umbrella, code I'd hate to live without includes:

Of the things I've written myself, I would hate to live without these things (please note, these are all hosted in Dropbox; if you want copies and can't get to Dropbox please let me know!):
  • Replication scripts!
  • A script to get stored procedure execution information, and the query plan if you want.
    • This query gives a lot of info - when the plan was cached, how many times the plan's been looked up, how many things reference the plan, when the proc was last executed, how many times it's been executed, all sorts of worker, reads, and writes info including averages.
    • The usual caveats on the cache info apply - restarts, cache flushes, index rebuilds, plan recreations, etc. are all things to keep in mind
  • Some templates for file manipulation
I've got a lot more, but these are the most important to me.

Thursday, February 08, 2018

On the importance of testing your backups

You're backing up your databases. The jobs run when they're supposed to, you've chosen the right types of backups, got all your settings dialed in, they take the expected amount of time, the files are there on disk. Everything's good!

Well, maybe.

Have you restored your backups? You should. I'd even go so far as to say that without test restores, your careful backup strategy was a waste of time and energy.

"But the backups complete without errors," you say. And that's certainly good, but it's only part of it. Things can go wrong in backups. Disk hiccups after the backup is generated can cause problems. Bugs in the database engine can cause unexpected problems as well. Different database products will have different issues and quirks, but the key thing to remember is that they ALL have them. Your preferred RDBMS is no different. It's up to you as the DBA to learn what they are, and to learn how to work with and through them.

This is one that I've personally run into: "RESTORE DATABASE is terminating abnormally" error message when you perform a full backup of a database in SQL Server 2008 R2

We added a new data file to a database, and due to a quirk in our naming convention the truncation caused a problem: we wound up with two identical logical filenames in the backup file list, and we couldn't restore the backup. We were fortunate that we didn't need to do any restores for that database before we found the problem, and Microsoft did release the fix in the linked article, but we would've been in serious trouble if we hadn't caught the issue and suffered data loss.

You also need to test your restores in a realistic fashion. How much data can you recover? How long will it take? This is when you will need to consider the business's needs.

There are two objectives around recovery that you need to consider: the recovery point objective (RPO) and recovery time objective (RTO). In brief, they respectively address how much data the business is comfortable with losing, and how long the business is comfortable with a database being offline in case of failures. People too often assume that both RPO and RTO will be close to zero. It's certainly possible to get very close to instantaneous recovery with no data loss, but it can get very expensive very quickly as you're then getting into high availability (HA) territory. The old saw, 'fast, cheap, and good: pick two,' definitely holds true here!

If you don't know how long your restore will take, you won't know if you can meet the RTO. If you don't know how much data is available in your backups, you won't know if you can meet RPO. If your backups are bad, you're guaranteed not going to meet either the RPO or RTO unless neither matters for the instance (but how many instances do you manage that fall under that 'it would be nice to recover it, but we can live without it' heading? I know I don't have many in production). If you haven't talked with your management about their expectations around RPO and RTO, please do so.

While RPO and RTO are certainly important considerations, they can be ongoing topics of conversation - your first order of business is to ensure that your backups first CAN be restored, and second that they give you the results you expect. As with anything, practice makes perfect. Openness and honesty will go a long way if you identify gaps. These can be hard things to bring up, I know - I've had to have a few unpleasant conversations on this topic over the course of my career. But how much worse will the conversations be if something fails and you can't recover?

Test your backups, folks! You'll be glad you did!