Category Archives: Uncategorized

Index Fragementation Still Matters

Executive Summary

Until recently a huge driver for index maintenance in SQL Server was that reading data from contiguous blocks on disk was more efficient than randomly scattered reads on magnetic media. Now that solid state storage is taking over, some see index defragmentation as wasted effort. While fragmentation may be less of a concern than it was in past times, regular index maintenance does still boost disk utilization as well as memory efficiency by reclaiming unused space inside pages.

The Short Version

Yes it is true that solid state storage is less sensitive than magnetic media to whether data is accessed sequentially versus accessed in a random order. Note that it is not actually true that this doesn’t matter at all, it does matter when we get close to the throughput of our storage (see below). But that is not the point of this post.

Remember that there are actually two different kinds of fragmentation that we discuss. The one we typically think of, where database pages are stored out of order on disk, is external fragmentation. As we may guess from the name, there is also another phenomenon frequently referred to as internal fragmentation. Internal fragmentation refers to inefficiency with how data is organized inside pages – specifically issues that cause us to have chunks of unused space inside pages. Obviously this has an impact on how much unused space is available inside our data files, but as pages are read into the buffer pool high levels of internal fragementation can actually impact how much usable data will fit in the buffer pool and eventually will drag on page lifetime.

How do we control internal fragmentation? The same way that we control external fragmentation – by rebuilding or reorganizing indexes. For some workloads it can be important to check for internal fragmentation (by way of avg_page_space_used_in_percent in dm_db_index_physical_stats) directly, but for most situations the main lesson is to continue defragmenting as we have for years.

The Long Version

Old School View of Index Maintenance

Think back to … say … 2010. Many of us were terribly excited about seeing “The Last Airbender” or “Hot Tub Time Machine”. Ugg boots were at the peak of their inexplicable popularity. Cee Lo was singing “F*ck You”. And many database folks in smaller shops were running SQL Server 2008 with something like 8GB of RAM on physical hardware backed up by spinning rust. The word “spinning” is what matters here. Our data was stored on physical platters. Before a page could be read into memory, we had to wait for two things to happen – the heads inside our hard drive had to be moved to the correct track on the disk, and then we had to wait for the correct portion of the disk to physically spin around underneath the head. Throughput is maximized if we read as much data as we possibly can while neighboring blocks are spinning past. In other words, doing a few large sequential reads is fundamentally more efficient than large numbers of small, random access reads.

Why does this matter? Mostly because we have historically expected the bottleneck in relational database servers to be I/O. If we know that sequential I/O is fundamentally more efficient than random access I/O, then the easiest way to get more bang for our storage buck is to arrange for more of our reads to be sequential. How do we do this? That’s the easy part. We identify those indexes that are not stored in contiguous blocks on the disk, and then we either rebuild them or reorganize them.

Do we rebuild or reorganize? That’s not something I’ll dig into in this post. In a nutshell, reorganizing an index simply redistributes data among the pages already allocated to the index while rebulding is a more drastic operation that actually rebuilds the index. Which makes more sense depends on many factors including SQL Server edition, whether or not index can be offline, whether or not we need to be able to interrupt the command, etc etc. A great post by Kendra Little on some of these issues can be found here.

Why Index Maitenance Isn’t Getting As Much Love Now

Technology moves quickly, and even since 2010 quite a bit has changed.

  • SQL Server is a data hoarder, and its house has gotten bigger. Think of SQL Server as an eccentric person whose house is filled with … say … stacks and stacks of old newspapers. The “house” in this example is memory on the server that SQL Server occupies, and the pages of newspapers are data pages from our database. If we ever ask ask for a piece of data from disk, it will get hoarded (cached) in memory for absolutely as long as the space is not needed for something more important. That’s actually a really good thing – if we give our database server large amounts of memory, there’s a pretty good chance that once the cache is warmed up any data that we’re actively using will be in memory. If most of the data we need is in memory already, that means we are not working the I/O system as hard which makes us a bit less sensitive to disk performance. If we do only have room to allocate 4GB to the buffer pool that’s not a lot of room for data. But as more folks are able to afford hundreds of GB of RAM it’s reasonable to assume there will be a higher cache hit ratio for a given amount of data.
  • More folks are aware of the importance of query tuning. Monitoring tools have gotten better. Thanks to events like SQL Saturday, education amongst the SQL Server community has improved. Odds are better today than they used to be that if a query is doing a huge index scan then somebody will notice and do some tuning. Scans really do make sense for some workloads, but if  we do have a workload where some large scans can be eliminated by tuning efforts then doing so actually accomplishes two things. First, the query tends to run faster because it tends to do less I/O. The second impact, however, can be less obvious. As large scans are replaced with smaller seek operations (or even smaller scans), that tends to make disk access more random. More random access means we notice the impact of external fragmentation less.
  • Finally the elephant in the room. More and more of us have transitioned from spinning magnetic media to solid state storage. Because retrieval is no longer dependent on moving parts, solid state devices tend to perform very well for random access workloads. Since the argument that was used on most of us to get us serious about index maintenance was based on the way spinning discs operate, the rise of SSDs has led to a widespread belief that index maintenance doesn’t matter anymore. It’s worth noting that even if sequential reads are no faster than random reads it is still possible that more IOPs are required in the presence of fragmentation. That means external fragmentation can still matter in high performance workloads – I will not discuss further here but I would refer the interested reader to Jonathan Kehayias’ post here.

These are all excellent points, and I tend to agree that external fragmentation (data out of order on disk) is not as much of an issue as it has historically been. That does not mean that index maintenance does not matter at all because reduction of external fragmentation was not the only reason that we rebuilt indexes. The other type of fragmentation that we worry about, and which is also reduced by index maintenance, is internal fragmentation. Internal fragmentation refers to unused space inside pages on data files. In other words, in addition to recording data physically on disk, index maintenance is also the mechanism that we use to reclaim space from deleted or updated records in our databases.

An Example

If anybody wishes to follow along with this example, here is the script that I used to create my sample database (click to expand). In a nutshell, the DB contains two tables. Into one table we insert a large amount of data in reverse order (so it will be perfectly fragmented, in terms of external fragmentation). Into the other table, we insert a large amount of data in order, and them delete the majority of it so that we have large holes of unused space on every page.

USE [master]

PRIMARY ( NAME = N'Frag_P', FILENAME = N'[some_filepath_here]\Frag_P.mdf' , SIZE = 1GB , MAXSIZE = UNLIMITED, FILEGROWTH = 1GB ),
FILEGROUP main DEFAULT( NAME='Frag_D', FILENAME = N'[some_filepath_here]\Frag_D.ndf', SIZE=21GB, MAXSIZE = UNLIMITED, FILEGROWTH = 1GB )
LOG ON( NAME = N'Frag_L', FILENAME = N'[some_filepath_here]<span id="mce_SELREST_start" style="overflow: hidden; line-height: 0;"></span>\FRAG_L.ldf' , SIZE = 8000MB , FILEGROWTH = 8000MB )


USE FragmentTest


CREATE TABLE frag.InternalFrag(
Payload CHAR(996) NOT NULL CONSTRAINT DF_InternalFrag_Payload DEFAULT 'Time to look big',

CREATE TABLE frag.ExternalFrag(
Payload CHAR(996) NOT NULL CONSTRAINT DF_ExternalFrag_Payload DEFAULT 'Time to look big',

-- Create internal framentation by populating a table with 10GB of data then deleting 7/8 of the data.
INSERT INTO frag.InternalFrag(Payload) VALUES('Time to look big');
GO 10485760

DECLARE @first_match INT;
DECLARE @last_match INT;

WITH x AS(SELECT TOP 100000 ID FROM frag.InternalFrag WHERE ID % 8 > 0)
SELECT @first_match = MIN(ID), @last_match = MAX(ID) FROM x;

WHILE @first_match IS NOT NULL AND @last_match IS NOT NULL
DELETE FROM frag.InternalFrag WHERE ID >= @first_match AND ID  0;

WITH x AS(SELECT TOP 100000 ID FROM frag.InternalFrag WHERE ID % 8 > 0)
SELECT @first_match = MIN(ID), @last_match = MAX(ID) FROM x;

-- Insert about 10GB into the external frag table, in reverse order, to generate external fragmentation
DECLARE @count INT = 10485760;
WHILE @count > 0
INSERT INTO frag.ExternalFrag(ID) VALUES(@count);
SET @count -= 1;

Lets go ahead and see what fragmentation looks like against these two tables like so

SELECT OBJECT_NAME( ps.object_id ) AS Tab, AS Ind,
FROM sys.dm_db_index_physical_stats( DB_ID(), DEFAULT,
	INNER JOIN sys.indexes i ON ps.object_id = i.object_id
		AND ps.index_id = i.index_id;

Image 1: This is not the fragmentation you are looking for

As expected, the ExternalFrag table where we intentionally inserted data out of order is extraordinarily fragmented. But what about the InternalFrag table? With fragmentation of less than 1% does that mean we do not need to worry about it? Actually what this really means is that the default “LIMITED” mode of the dm_db_index_physical_stats function doesn’t look at leaf-level pages so we are not getting information on what things look like at the leaf level. The fix is straightforward, we can instead run in the “SAMPLED” mode (or even “DETAILED” if we want to scan the entire leaf level … and have a ton of time to kill).

SELECT OBJECT_NAME( ps.object_id ) AS Tab, AS Ind,
FROM sys.dm_db_index_physical_stats( DB_ID(), DEFAULT,
	INNER JOIN sys.indexes i ON ps.object_id = i.object_id
		AND ps.index_id = i.index_id;

Image 2: There’s the wasted space

This is more like it. We have to wait longer for results, but the result set now tells us how much unused space is sitting in data pages waiting to be reclaimed. Those who are accustomed to looking at avg_fragmentation_in_percent are used to thinking of low numbers as good and high as bad. This is reversed for avg_page_space_used_in_percent – a low percentage of used space could indicate we need to rebuild (unless we intentionally specified a large fill factor). So in this case, almost 88% of the space in the InternalFrag table is not actually getting used. Why does this matter? The obvious reason is that if we aren’t intentionally padding the InternalFrag table then it is taking up about 8 times as much space as it needs to on disk. But also consider what happens when we run the following queries


SELECT COUNT(*) FROM frag.InternalFrag;
SELECT COUNT(*) FROM frag.ExternalFrag;

On my workstation here are the results

(1 row affected)

Table 'InternalFrag'. Scan count 5, logical reads 1321485, physical reads 0, read-ahead reads 1309772, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

Table 'ExternalFrag'. Scan count 5, logical reads 1319382, physical reads 2, read-ahead reads 1314985, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now what happens if we defragment the indexes like so?

	ON frag.InternalFrag REORGANIZE;
	ON frag.ExternalFrag REORGANIZE;

After this, when we re-run the COUNT(*) queries above we get the following message

(1 row affected)

Table 'InternalFrag'. Scan count 5, logical reads 174607, physical reads 2, read-ahead reads 170547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

Table 'ExternalFrag'. Scan count 5, logical reads 1319552, physical reads 9, read-ahead reads 1325530, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

That is an 87% reduction in logical reads for the query against InternalFrag. If this were a real wold scenario, that could mean at the end of the day 87% fewer pages that could help satisfy future queries getting unceremoniously dumped from memory if we ever scan this table.

For the sake of completeness, before we leave lets take another look at the results of the dm_db_index_physical_stats query to be absolutely certain that both types of fragmentation have been cleared up.

Internal Fragmentation Is Important

One of my pet peeves on most projects I’ve joined is that the index maintenance jobs are typically checking for high levels of avg_fragmentation_in_percent and ignoring avg_page_space_used_in_percent. Am I saying that we’ve been doing things wrong? It depends. But for smaller DBs … yeah kind of. Checking for excessive amounts of free space inside data pages directly is probably better … because unused space is more of an issue than pages that are merely out of order on many systems nowadays. I’m not saying we should not check for external fragmentation at all though, I actually like to keep an eye on both. Remember that even on solid state storage, external fragmentation still matters a little as we get close to the performance limits on our hardware … and if we say that our servers are I/O bound then that means we probably do push this limit at times (by definition). Assuming that there is enough slow time on the system for a call to dm_db_index_physical_stats in either sampled or detailed mode as well as any desired index rebuild/reorganize operations I like to keep an eye on both.

What If There Isn’t Time?

The abover advice should be fine for the vast majority of folks reading this. But there is an issue with even the sampled mode of dm_db_index_physical_stats – it takes longer. We may not care how long this takes if we are looking at something like AdventureWorks where the difference between limited and sampled modes is measured in seconds or minutes. But for larger datasets, it’s possible that limited mode will take an hour or two (or even longer) while sampled could take several hours to complete. On such systems, there is a chance that there isn’t time to both check for unused space and also do any useful amount of defrag work. One possible fix is simply to just check fragmentation levels less frequently. For example, rather than checking fragmentation every day, maybe it is possible to check fragmentation levels once per week and scatter rebuild operations over slow periods on their other days. Another possibility could be to not spend time on dm_db_index_physical_stats on any recurring basis and to instead develop a fixed schedule for rebuilding indexes that are known to be problematic.

There is another possibility though. My examples that show external fragmentation without internal, and more importantly internal fragmentation without external are a bit contrived. In real world situations it is normal for the two kinds of fragmentation to be seen together. Workloads vary so do your own testing … but for most real-world situations external and internal fragmentation tend to be closely correlated. Tables that have high levels of internal fragmentation are going to tend to also have significant external fragmentation. I would like to reiterate – whenever practicable I am a huge advocate for tracking internal fragmentation directly. That said, if the amount of time needed for sampled mode is an issue, odds are that rebuilding externally fragmented tables will keep internal fragmentation in check. Just be sure to consider running an extra rebuild or reorg after doing massive deletes or updates.


Why Relying on SSMS for Execution Plans Isn’t Enough

One thing I’ve noticed about myself – I don’t tend to want to blog about the same subjects that I want to talk about at SQL Saturday. Different media, different audience, so different topics probably aren’t surprising. But in an effort to be more consistent … here is a taste of the kind of material that I cover in my ‘Why Should I Care about .. The Plan Cache” presentation.

If you’re still reading this, there’s a great chance than you’ve looked at an execution plan by putting a query into SSMS, hitting the button to ask for the actual plan, and then executing the query. Pretty easy, pretty awesome, that’s how performance tuning is done, right? Well, even if we ignore the possibility of different session settings (won’t get into that here), there are a few issues.

Why It’s Not That Easy

  • If the end user doesn’t report an issue right away, they might not remember exactly what they were doing at the time they noticed a performance issue.
  • Even if the user does remember, they usually can’t tell you what query was running (they were probably using some kind of software).
  • Documentation for that software likely won’t drill down to the level of what queries are run against the database.
  • If software documentation does exist and actually does include queries, that’s a very detailed software document. Has it been kept up to date?
  • If the software was developed in house, why not ask the developer? If the developer that wrote it is still around, they probably won’t know off the top of their head exactly what the query looked like. Odds are they would have to investigate.
  • Even if it’s possible to quickly find exactly where in the code the problematic query is executed, that might not give us the query. It’s becoming more common for some kind of dynamic SQL to be involved, which means the actual query could depends on parameter values.

It may sound like I’m trying to say it’s not reasonable to ask the development team what TSQL is actually getting run. Of course that’s a reasonable thing to ask. The point I’m trying to make is just that it likely that you will not get an answer immediately. When troubleshooting performance issues time usually matters, so there is usually better if you can quickly find out for yourself what is causing the problem.

So … It Appears that September Will Be all About the MCM

News about the death of the MCM / MCSM broke right around the time I was starting to write up an interesting case study involving the ascending key problem (for those unfamiliar, see Gail Shaw’s excellent writeup for example), so that post will probably need to wait for a while. It’s not my intent to rehash the #SQLMCM issue here, if anyone who cares about the MCM (Microsoft Certified Master) program isn’t already up to speed on the basic issues a good starting point can be found from Jason Brimhall and at the #SQLMCM hash tag on twitter. Part of the reason that I am not really ranting is that I am actually relatively fortunate. I had been planning to take the lab exam (final hurdle before becoming a MCM) at the end of September anyway so the only real impacts on me are

  • I need to decide whether or not it’s worth following through with the exam when the certification is dying (the answer is probably yes).
  • I am sure I will need to cover this out of my pocket now. I’ve been fortunate enough to have a part-time W-2 gig (on top of my consulting load) that has actually been quite supportive of my MCM quest up to this point. Now that “the email” has gone out I am frankly too embarrassed to even ask if they would care to pony up a couple of grand more for the final test when the plug is getting pulled the day after (maybe literally) I take the exam. Maybe the only reason to consider doing so is that I actually wrote the MCM attempt into my performance plan for the year. This is actually not as bad as it sounds, I am dual employed so it really is only fair that Rick-the-consultant pays part of the certification expense which will benefit me equally in both of my current roles.
  • Some lost prep time. When I said I planned to take the exam at the end of September that was my optimistic estimate. In the back of my head I had actually started telling myself I may wait until the end of October. That exam is no joke, an extra month of prep time would have been handy.
  • It’s all riding on this attempt. Before the announcement I figured that if I did not pass this attempt I could consider regrouping for another try before the MCM exams were retired in favor of the new MCSM exams.

First : MCM vs MCSM – A Long Tangent

But I’m not interested in dwelling because I really am one of the lucky ones. I’m more interested in offering my perspective as one who is in the pipeline at this point in history. First and foremost I was always a lot more excited about the legacy MCM certification than I was about the MCSM. I just felt “Microsoft Certified Master” was an awesome description. It is intuitive. It rolls off the tongue. It is clear. I imagine my future self shaking hands slightly more firmly and standing slightly straighter as “Rick Lowe, Microsoft Certified Master”. It is immediately clear to any person who hears this exactly what I am claiming (that I am awesome), why I feel justified in claiming it (my awesomeness has been certified), and what evidence is available (this Rick guy should be able to cough up a transcript sharing code). This is actually why I am ready to take the exam, if I had not been concerned with becoming a MCM I probably would have been hanging back, updating my credentials from MCITP to MCSE and waiting from the MCSM exams to be published.

On the other hand, when I imagine a possible future self introducing himself as “Rick Lowe, MCSM”, or worse “Rick Lowe, Microsoft Certified Solution Master on the Data Platform” I am pretty sure I will be slouching as the eyes of whoever I’m speaking to glaze over slightly. “That is not just more alphabet soup”, my future self says, “that is a pinnacle certification. Here, let me draw you a diagram of all the new SQL Server certifications so you can see this blue pyramid thingie. Look, there’s no gray at all in this pyramid which means this is the good one. You should really be rather impressed with me about now”. I suppose the alphabet soup issue is what really bothered my about the change from MCM to MCSM. Aside from just being a cool title MCM, both when spoken and written, looks quite different from MCITP which could be important when speaking with somebody outside of the SQL Server community. Even if they may have seen hundreds of resumes for “paper” MCITPs it is possible that the fact MCM sounds different may be enough to get their attention. MCSM, on the other hand, visually and verbally kind of blends in with MCSA and MCSE.

If it seems like I’m taking cheap shots at a particular set of visual aids that is not my intent. I love visual aids and have nothing against blue and grey pyramids. If I haven’t been clear enough, the point I am really trying to make is that the visual aids are not just helpful for understanding the certification roadmap, they may actually be kind of necessary for understanding the current roadmap unless one has a very good head for acronyms. And this does not help when it comes to acceptance of the current generation of credentials by business leaders.

But more importantly, Can we chart our own destiny?

Plenty has been written on the tricky revenue problem posed by the MCM exams (rather steep fixed costs would need to be offset by a relative handful of test takers before this test is profitable). One commonly expressed concern is that it may just be too difficult for Microsoft to make a profit from the MCM/MCSM program and may simply never reintroduce the concept of a pinnacle certification. Another is that they may fix the revenue issue by dumbing down the tests enough to increase the percentage of SQL Server MCPs who would be able to pass. More potential conversions would probably mean more test takers which would definitely mean more revenue. Both of these cases are troublesome for many in the community who value the rigor and the “unfakeability” of the MCM/MCSM exams.

My question at this point is this : if we as a community do not have a lot of faith that Microsoft will bring back the MCM in a satisfactory form, or if we are concerned that we just care more about this particular certification than they do, why are we waiting for them to do so?

It’s probably not realistic for anyone to expect the creation of a “Community Certified Master of SQL Server” program. Don’t get me wrong, it would be fantastic if somebody could come to a conference, sign up for a “test your mettle” hands-on precon or postcon, and potentially walk out as a “CCM”. But getting the community involved in the master testing process does not change the underlying economic issues. Developing the test would take a tremendous amount of effort. Administering the test would be a nightmare. Do we offer the test online? Probably not because it would be too easy for somebody to either cheat or capture the questions to look up later. Establish a dedicated testing center? Probably not realistic for this volume of test takers. Remote proctor? Maybe, but that could be very expensive because it probably requires something close to a 1:1 ratio of proctors to test takers. Co-locate with a conference? Might be the most workable but is it a problem if the exam can only be attempted once or twice a year?

And of course, even if a delivery method is found this does not change the underlying issues. Developing a MCM-type test would be very expensive (it bears repeating). Convincing industry that it should care about the CCM would be even more difficult than it was to try to convince them to care about the MCM. I am sure there are many many more.

Unfortunately I actually do not have any productive suggestions here, all I can really do at this point is suggest that the death of the MCM could be an opportunity for us to do something even better. And it may be foolish to believe this means testing – as Brent Ozar points out there are a lot of cool experiments we could conduct that have nothing to do with a traditional certification program. But if anyone does have an idea I may be interested in pitching in. After October 1, of course. The rest of September is cut out for me.

Coming soon

Hi all,

I have a little extra time on my hands for the next few months, which means this wasn’t only the ideal time to start blogging but also that I should be able to post relatively frequently for a while. That said, I’m getting ready to disappear on a rafting trip for a week. If you discover this space while I’m gone and are wondering whether or not it’s worth coming back here are some topics I’m planning to write about in the next few months.

  • The correct way of getting Oracle Instant Client for work with SSRS / SSIS. Google search may be leading you astray.
  • For the DBAs : The potential of Entity Framework. Why I really wish I could love EF.
  • For the developers : The failure of Entity Framework to live up to its potential. Why EF may be causing your DBA to drink in the morning.
  • The ascending key problem. Why did performance suddenly get inconsistent shortly after we deployed?
  • CRUD squared. When stored procedures go awry (AKA Rick loses some friends part 1).
  • That time I turned on RCSI for the sole purpose of getting the developers to stop using nolock. Wasn’t that awesome? Or was it more of an evil waste of resources? (AKA Rick loses some friends part 2)
  • Social capital at the office. How to get the mean kids to realize how brilliant you are and start listening to you.
  • The limitations of self learning from the internet. Why I frequently pay out of my own pocket to go to conferences.

But more importantly, feel free to contact me to ask questions or even just suggest that I cover a particular topic. This request may be more relevant in the future because you probably can’t tell from a single blog post how valuable my opinion is, but ultimately I do this because I love talking about SQL Server. If nobody is reading this then I’m just talking to myself which I have been known to do that on occasion, but I would much rather talk to somebody else. The more I know about what issues you would most like my warped perspective on, the more productive that conversation can be.