露の世は露の世ながらさりながら。。。

Saturday, March 16, 2024

It's been a while, hasn't it?

Hello friends! It's been quite a while! I've been out of the loop for some time, and so much has changed since I last posted. I'm going to be posting again, I just haven't decided if I want to pick up my change tracking series, or post on more dev-related stuff (I've been doing dev for the past 3 years), or what.

Regardless, I felt it was time to revive this blog.

Sunday, May 10, 2020

Tracking Data Changes - Change Tracking, Part 3

In my last post, I showed you how to configure change tracking at the table level and how to get configuration information about change tracking from the database engine. We looked at sys.change_tracking_databases and sys.change_tracking_tables, and looked at some sample scripts that present the information in a more readable format.

Before moving on to working with change tracking, I'd like to show you a little bit about how SQL Server handles change tracking data under the hood. Let's take a few minutes to talk about sys.internal_tables, sys.dm_tran_commit_table, and sys.syscommittab. These aren't objects that most DBAs interact with on a routine basis, but they're useful for understanding how change tracking does what it does.

sys.internal_tables is a catalog view that returns one row for each internal table. All objects listed in sys.internal_tables reside in the sys schema of their parent database, and all sport system-generated names that indicate their function. Records here inherit all the usual stuff from sys.objects, plus some useful additional fields for the database engine.

To identify which tables are related with other objects, check the parent_object_id column:

For our purposes, the internal tables we care about are the ones named change_tracking_<parent_object_id>. As you may have surmised, this is the naming convention for the internal change tables.
With some thought you can build out quite a bit of information about the internal tables:

While you can get information about these internal out of sys.internal_tables by joining on some other catalog views, you typically won't be able to get at the contents of the tables themselves:

if you absolutely, positively, cross-your-heart-and-kiss-your-elbow MUST see what's in them, you'll need to use the DAC.

A discussion of the finer points of using the DAC (and yep, you can get at it via Azure Data Studio, and yep, in SQL Server on Linux!) is well outside the scope of this article, so while it's an invaluable tool for emergency use I'm not going to do more than mention it here. Also, the raw contents of the internal change table isn't as useful as you might hope. All that said, I know at least a few of you are itching to see what it looks like, so here you go!

Don't worry! We'll cover the appropriate ways to get at your change data soon.

sys.dm_tran_commit_table is a system dynamic management view that returns transaction information for tables tracked by change tracking. It exposes the information stored in sys.syscommittab, which maps transaction IDs to log sequence numbers. Like the internal change table, sys.syscommittab is cleaned up as part of the change tracking cleanup process, and like the internal change table it can grow quite large if your cleanup interval is too long or if there is a problem with the cleanup process. And, again like the internal change table, you can view its contents using the DAC, and yet again - like the internal change table, the contents aren't terribly useful on their own.

"So you can't easily get the data out of the tables listed in sys.internal_tables and even when you do it's not really helpful; how is this useful, Tim," you ask? From the previous paragraphs, you may have gathered that one important use case for change tracking is finding which internal table stores change data. Once you have the table name you can determine how much space a given table's change tracking information is occupying. Even though change tracking stores minimal information about changes, it can add up. Even if the tracked table doesn't see a lot of INSERT, UPDATE, and/or DELETE operations, if your cleanup interval is too long (or the cleanup task has problems for a long period of time) the table could possibly grow quite large.

I've mentioned cleanup issues a few times. Here too, don't worry! We'll talk about the cleanup soon too.

Thank you for reading this far! In my next post, I'll show you more about working with change tracking. Please join me!

Friday, May 08, 2020

A few updates to the blog

Hello friends! I've made a few small updates to the blog today. If you've been here before the most obvious change is to the theme. I've added a few links on the right (when viewed in a computer's browser; I'm still trying to sort out how to get them to show on mobile) to my:

I'm still putting together some other things that'll go up in GitHub, and they'll get links in due course.

I'm working on a new post in the change tracking series, so look for that by the end of the weekend (that's by 10 May 2020, if you're looking at this some time in the future).

Monday, February 17, 2020

Another note on versions used in this blog

Hello readers! Last year I made a post about the SQL Server versions I use for this blog, and it's time to post an update.

I'm a Mac user, so I've historically had to run SQL Server in a Windows virtual machine. Thanks to Microsoft's expansion into the Linux world, this has opened up my options quite a bit! As time has passed, I've largely stopped using SQL Server on Windows for my lab environment, focusing instead on the Linux version. I'm currently using SQL Server 2019 CU2 on Ubuntu 18.04.4 Server LTS running in a Parallels Desktop hypervisor (VirtualBox is a great free alternative, and runs on a wide variety of platforms). I also use Docker containers for ephemeral instances. I use Azure Data Studio and Visual Studio Code to do my management and dev stuff.

As such, please assume unless stated otherwise that everything in this blog is based on SQL Server 2019 for Linux, on whatever CU is current at the time of publication (though it might be the previous CU if a new one has been released just before I publish). Screenshots are going to generally be from Azure Data Studio. Once in a while I'll do something in SSMS or on a Windows-based instance, but I to call those out specifically.

Please note that SQL Server is officially supported on Ubuntu 16.04, though. I tried to get it running on 18 as a lark (it started life as a SQL Server 2017 instance, upgraded in place to 2019, and while it's doable, there are some hoops to jump through), but I don't do any production work here.

You've got lots of options these days for running SQL Server on non-Windows platforms! Soon I'll be learning Azure SQL Database, so that'll be another new frontier!

Sunday, February 16, 2020

Scaling is hard

Hello readers! I apologize for the lack of posts for the past... (checks calendar) six months... *cough* I will resume my series of posts on change tracking in the near future! In the meantime...

Enterprise scaling is hard. Anyone who says differently is selling something. Due to the incredible public visibility (to say nothing of the... investment... some of the fans bring), scaling in video games adds a miles-thick layer of difficulty. Recently, Bungie has had a few significant bugs crop up in their flagship title, Destiny 2. These bugs affected player inventories, and led to unplanned downtime while fixes were implemented. Twice. (Insert your choice of 'oof' meme here.) Bungie has published a good post-mortem that, while understandably light on the technical details, gives some insight into what they were doing and how things broke.

This incident called to mind something I worked through. It's absolutely not the same level of issue, but it's illustrative of the interesting things you see at scale. A previous employer's business requirements meant we had servers close to just about every major city on the planet, and they were kept fairly busy. Due to the nature of the business even a server that wasn't busy had to be very responsive, and a busy server that got slow was a nightmare for customers.

One server began periodically getting slow. REALLY slow. A proc that averaged a worker time of under a millisecond (props to the developers there: the tables were well-normalized, the queries well-written, and they understood indexing so the indexes were sensible as well) started taking over a second to respond. Our initial thought was that it had generated a bad plan by dumb luck (due to the nature of the data and how it changed over time, plans got regenerated a few times a week following stats maintenance, so far so normal), so we started by comparing the plan on this server to the plans from a sampling of its compatriots. The plan was indeed different (and bad!) on this one, so we gave the business a heads-up that we were going recompile that proc, we'd probably see a long execution while the new plan was generated, and so on. Recompiling worked a treat, the new plan was good, and matched the good plans on the other servers, so we were happy, the business was happy, everybody was happy.

A few weeks later, this one server started acting up again. Meet the new bad plan, same as the old bad plan. But only on this one server. We looked at a different sample of other servers of the same type - all consistent among themselves, and they all had generated the same good plan. So, again with the recompile, which made everything happy again. Once again the new plan was good, matching the good plan everywhere else.

We didn't wait for the third time to assume that it wasn't just dumb luck causing the bad plans. Parameter sniffing is fun, after all. One of the fun parts here was that even though the plan WAS using the desired index, it was just... wrong. It's worth noting that all the servers of this type had the same source tables replicated down to them. No local writes, just reads, and the data sets were identical.

So why was this lone box coming up with this suboptimal plan even using the correct index? Well, one common thread was that the bad plans were generated from one parameter value. The application these servers supported did a nightly test, and each server had a short list of test values that they'd cycle through. The tests were server-specific so no two servers would use the same test records. The server tests were done during slow hours for that region, and just so happened to happen after index maintenance was done. Every so often, the old plan would be cleared after maintenance and we'd get the bad plan when the right test parameter was passed in.

I started digging into the index. DBCC SHOW_STATISTICS is tremendously useful for understanding what a given index looks like under the hood. The histogram in particular was helpful, as it allowed me to tell where in the index the bad parameter was found. I pulled copies of the various objects related to the issue down into my sandbox and set to work on it.

After ensuring I could reproduce the problem at will, I set to checking the parameters used to generate the good plans. Some of the values corresponded to test records, some corresponded to customer data, but interestingly they all were found in different steps in the index. I noticed that the steps weren't all of identical size - the step with the bad parameter had a larger range than the other steps. Out of curiosity, I tried passing some other parameters from the same step as the parameter that produced the bad plan, and those parameters also caused the same bad plan to be generated. Similarly, other values from the steps that contained test values that generated good plans would generate good plans.

So far, so good. I did some inquiring and found that the bad parameter value was the only test value in that index step, but there was always the chance that a plan could be generated using some other parameter from that step if the plan fell out of the cache for whatever reason.

My next step was to see if changing our indexing strategy would help. The table had grown a lot over time since the application was rolled out, and maybe we'd hit the point when rethinking things was the right thing to do. A discussion of the nuances of index design is far outside the scope of this post, but as with most things the online documentation covers it pretty well, and of course many smart people have spilled a lot of virtual ink on the topic; your search engine of choice is your friend here.
I tried a number of things both on my own and in consultation with folks who know a lot more about indexes than I do, but nothing helped. While I was able to get the optimizer to consistently generate a plan that was slightly better than the bad plan regardless of the parameter passed, it was still substantially worse than the good plan. The solution wasn't to be found in the indexes, to be sure.

In the end, we wound up creating a job that checked the stored procedure's performance and would force a recompile if the maximum worker time was over a certain duration. We scheduled the job to run every few minutes, and this took care of the problem sufficiently well. We shortly deployed this to all the relevant SQL instances, and it turns out that it periodically helped prevent the problem on other servers as well over time. I had an opportunity to talk with some folks from Microsoft later, and they agreed that under the circumstances the recompile job was the best solution.

Scale gets strange at times. Things that you never encountered before start happening more and more frequently, and sometimes you have to approach the problem differently than you think.

Saturday, August 17, 2019

Tracking Data Changes - Change Tracking, Part 2

In my last post, I showed you how to enable, configure, and disable change tracking at the database level. Very good, as far as that goes, but you're not going to get very far without tracking some actual data changes!

To recap, enabling change tracking at the database level is very simple:

ALTER DATABASE <database name here> 
SET CHANGE_TRACKING = <ON | OFF>
(CHANGE_RETENTION = <positive integer> <DAYS | HOURS | MINUTES>, AUTO_CLEANUP = <ON | OFF>);

(Note on the code snippets: I'm using Markdown for this post to simplify my blogging workflow; please let me know if you'd rather I keep using Gists for code samples)

If you don't provide a retention period, SQL Server's default is 2 days. Auto-cleanup defaults to ON unless you tell it otherwise.

Easy!

The table level commands aren't any more complicated. Before we get started, please note that change tracking requires a primary key on the table you want to track. This is reasonable - you need some kind of unique identifier to tell you which row has changed.

With the PK requirement in mind, on we go!

Much as enabling change tracking at the database level uses a simple ALTER DATABASE command, enabling change tracking at the table level uses a simple ALTER TABLE command:

ALTER TABLE Demo_CT
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON);

There's not a lot going on here either. You have three things to specify:

The table you want to track changes on
Enabling or (spoiler alert!) disabling change tracking
Whether you want to track which column(s) were updated or not

As you may have guessed, if you want to disable change tracking, just change ENABLE to DISABLE and omit the WITH clause.

Speaking of that WITH, if you don't specify a setting for TRACK_COLUMNS_UPDATED, SQL Server will default it to OFF in order to reduce space requirements for change tracking.

While we're talking about TRACK_COLUMNS_UPDATED, it does just what it says on the tin. This is a good time to remind you that unlike other methods of tracking changes, all this does is tell you that a given record was updated and which columns were updated - it does not give you any state data. We'll cover this in more detail in a future post.

A word of warning: you can't modify the TRACK_COLUMNS_UPDATED setting without disabling and re-enabling change tracking on the table.

To see which tables have change tracking enabled, query the catalog view sys.change_tracking_tables:

As with sys.change_tracking_databases, the information is fairly basic. Kendra Little's fantastic article I referenced in my last post has a great script for this, too:

-- Kendra Little made this too; it's also posted at https://www.brentozar.com/archive/2014/06/performance-tuning-sql-server-change-tracking/
SELECT sc.name AS tracked_schema_name,
    so.name AS tracked_table_name,
    ctt.is_track_columns_updated_on,
    ctt.begin_version /*when CT was enabled, or table was truncated */,
    ctt.min_valid_version /*syncing applications should only expect data on or after this version */ ,
    ctt.cleanup_version /*cleanup may have removed data up to this version */
FROM sys.change_tracking_tables AS ctt
JOIN sys.objects AS so 
    ON ctt.[object_id]=so.[object_id]
JOIN sys.schemas AS sc 
    ON so.schema_id=sc.schema_id;

Of course, feel free to write your own if you need different information!

For our simple example, it gives us:

Joining on sys.schemas and sys.objects, Kendra's script yet again gives us human-readable results. I can't emphasize enough how important it is to script for readability.

As I mentioned before, disabling change tracking at the table level is equally simple:

ALTER TABLE Demo_CT
DISABLE CHANGE_TRACKING;

Don't forget - if you wish to disable change tracking at the database level, you'll need to disable change tracking for all tables first.

Thank you for continuing to read this series! Please join me next time when I show you around working with change tracking!

Sunday, August 11, 2019

Tracking Data Changes - Change Tracking, Part 1

Let's begin our adventure into tracking data changes in SQL Server with change tracking. Change tracking is a simple tool in SQL Server that does exactly what it says on the tin: tracks changes.

Microsoft provides excellent documentation on change tracking here. Microsoft has done a good job of retaining a lot of documentation for versions of SQL Server prior to 2014, but there are some gaps. Change tracking happily hasn't changed much since it was introduced in SQL Server 2008, so most of this information is valid for SQL Server 2012, 2008R2, and 2008.

SQL Server 2014 SP2 does introduce a documented stored procedure that you can invoke to clean up the internal change table (and don't worry, we'll cover this in due course!), but that's about it as far as documented functional changes go (note: if any of you hardcore internals folks know of under-the-hood changes between versions, even if they work the same on the surface, please let me know! I love that kind of detail!). That said, there's always the chance that future versions will introduce differences, so keep that in mind.

Anyway, moving forward! Here is a quick overview of change tracking:

The change source is an in-memory rowstore, flushed to disk with each checkpoint
It answers two questions:

Has a row changed?
What rows have changed?

The only values from the source table stored in the change table are the primary key values for the changed rows
You have to use the built-in functions to get at the change data
The database you want to track changes in must have a compatibility level of at least 90

Filed under 'ways the database engine will let you score an own goal', SQL Server will still let you enable change tracking on a database if the compatibility level is lower than 90, but the functions used to retrieve change data will give you an error

Change tracking is a fairly simple tool to set up and use. In order to enable change tracking, you must first enable it at the database level via an ALTER DATABASE statement, and then at the table level via an ALTER TABLE statement.

Let's begin at the database level. The ALTER DATABASE statement looks something like...

There aren't a lot of parameters here. You can set change tracking on or off, you can specify your retention period, and you can specify whether to enable auto-cleanup or not.

For the retention period, you have the choice of DAYS, HOURS, or MINUTES. If you don't specify retention, SQL Server will configure change tracking with a retention period of 2 days. If you provide a number but don't specify the interval, SQL Server will default to days. The minimum retention period is 1 minute.

For auto-cleanup, if you specify OFF SQL Server will not automatically clean up change tracking data. If you don't specify auto-cleanup, SQL Server will default to ON. Unless you want to be responsible for cleaning up after change tracking, ON is your best bet.

To see which databases have change tracking enabled, and to look at their configuration, query the sys.change_tracking_databases catalog view:

It covers the basics, but it could be more informative. Kendra Little has a nicer script:

For our simple example, it gives us:

By joining on sys.databases to provide the database name and omitting the retention_period_units in favor of the description, Kendra's script gives us much more readable results. You'll notice that this script omits the max_cleanup_version column. We'll come back to that, and when it's useful, in a future post.

As an aside, while it can sometimes be a pain to script for readability, it almost always pays off in the long run.

Disabling change tracking is similarly simple. As with enabling change tracking, you disable it with a very straightforward ALTER DATABASE command:

You will need to disable change tracking on all tables before disabling it at the database level, but as we haven't looked at the table level commands yet we'll just pretend we've already done that.

You can change the database-level settings for change tracking with a similar ALTER DATABASE script. Just omit the = ON or OFF:

You can modify either or both the retention period and the auto-cleanup setting.

That wraps it up for this post! Join me next time when we look at enabling change tracking at the table level!