Until recently a huge driver for index maintenance in SQL Server was that reading data from contiguous blocks on disk was more efficient than randomly scattered reads on magnetic media. Now that solid state storage is taking over, some see index defragmentation as wasted effort. While fragmentation may be less of a concern than it was in past times, regular index maintenance does still boost disk utilization as well as memory efficiency by reclaiming unused space inside pages.
The Short Version
Yes it is true that solid state storage is less sensitive than magnetic media to whether data is accessed sequentially versus accessed in a random order. Note that it is not actually true that this doesn’t matter at all, it does matter when we get close to the throughput of our storage (see below). But that is not the point of this post.
Remember that there are actually two different kinds of fragmentation that we discuss. The one we typically think of, where database pages are stored out of order on disk, is external fragmentation. As we may guess from the name, there is also another phenomenon frequently referred to as internal fragmentation. Internal fragmentation refers to inefficiency with how data is organized inside pages – specifically issues that cause us to have chunks of unused space inside pages. Obviously this has an impact on how much unused space is available inside our data files, but as pages are read into the buffer pool high levels of internal fragementation can actually impact how much usable data will fit in the buffer pool and eventually will drag on page lifetime.
How do we control internal fragmentation? The same way that we control external fragmentation – by rebuilding or reorganizing indexes. For some workloads it can be important to check for internal fragmentation (by way of avg_page_space_used_in_percent in dm_db_index_physical_stats) directly, but for most situations the main lesson is to continue defragmenting as we have for years.
The Long Version
Old School View of Index Maintenance
Think back to … say … 2010. Many of us were terribly excited about seeing “The Last Airbender” or “Hot Tub Time Machine”. Ugg boots were at the peak of their inexplicable popularity. Cee Lo was singing “F*ck You”. And many database folks in smaller shops were running SQL Server 2008 with something like 8GB of RAM on physical hardware backed up by spinning rust. The word “spinning” is what matters here. Our data was stored on physical platters. Before a page could be read into memory, we had to wait for two things to happen – the heads inside our hard drive had to be moved to the correct track on the disk, and then we had to wait for the correct portion of the disk to physically spin around underneath the head. Throughput is maximized if we read as much data as we possibly can while neighboring blocks are spinning past. In other words, doing a few large sequential reads is fundamentally more efficient than large numbers of small, random access reads.
Why does this matter? Mostly because we have historically expected the bottleneck in relational database servers to be I/O. If we know that sequential I/O is fundamentally more efficient than random access I/O, then the easiest way to get more bang for our storage buck is to arrange for more of our reads to be sequential. How do we do this? That’s the easy part. We identify those indexes that are not stored in contiguous blocks on the disk, and then we either rebuild them or reorganize them.
Do we rebuild or reorganize? That’s not something I’ll dig into in this post. In a nutshell, reorganizing an index simply redistributes data among the pages already allocated to the index while rebulding is a more drastic operation that actually rebuilds the index. Which makes more sense depends on many factors including SQL Server edition, whether or not index can be offline, whether or not we need to be able to interrupt the command, etc etc. A great post by Kendra Little on some of these issues can be found here.
Why Index Maitenance Isn’t Getting As Much Love Now
Technology moves quickly, and even since 2010 quite a bit has changed.
- SQL Server is a data hoarder, and its house has gotten bigger. Think of SQL Server as an eccentric person whose house is filled with … say … stacks and stacks of old newspapers. The “house” in this example is memory on the server that SQL Server occupies, and the pages of newspapers are data pages from our database. If we ever ask ask for a piece of data from disk, it will get hoarded (cached) in memory for absolutely as long as the space is not needed for something more important. That’s actually a really good thing – if we give our database server large amounts of memory, there’s a pretty good chance that once the cache is warmed up any data that we’re actively using will be in memory. If most of the data we need is in memory already, that means we are not working the I/O system as hard which makes us a bit less sensitive to disk performance. If we do only have room to allocate 4GB to the buffer pool that’s not a lot of room for data. But as more folks are able to afford hundreds of GB of RAM it’s reasonable to assume there will be a higher cache hit ratio for a given amount of data.
- More folks are aware of the importance of query tuning. Monitoring tools have gotten better. Thanks to events like SQL Saturday, education amongst the SQL Server community has improved. Odds are better today than they used to be that if a query is doing a huge index scan then somebody will notice and do some tuning. Scans really do make sense for some workloads, but if we do have a workload where some large scans can be eliminated by tuning efforts then doing so actually accomplishes two things. First, the query tends to run faster because it tends to do less I/O. The second impact, however, can be less obvious. As large scans are replaced with smaller seek operations (or even smaller scans), that tends to make disk access more random. More random access means we notice the impact of external fragmentation less.
- Finally the elephant in the room. More and more of us have transitioned from spinning magnetic media to solid state storage. Because retrieval is no longer dependent on moving parts, solid state devices tend to perform very well for random access workloads. Since the argument that was used on most of us to get us serious about index maintenance was based on the way spinning discs operate, the rise of SSDs has led to a widespread belief that index maintenance doesn’t matter anymore. It’s worth noting that even if sequential reads are no faster than random reads it is still possible that more IOPs are required in the presence of fragmentation. That means external fragmentation can still matter in high performance workloads – I will not discuss further here but I would refer the interested reader to Jonathan Kehayias’ post here.
These are all excellent points, and I tend to agree that external fragmentation (data out of order on disk) is not as much of an issue as it has historically been. That does not mean that index maintenance does not matter at all because reduction of external fragmentation was not the only reason that we rebuilt indexes. The other type of fragmentation that we worry about, and which is also reduced by index maintenance, is internal fragmentation. Internal fragmentation refers to unused space inside pages on data files. In other words, in addition to recording data physically on disk, index maintenance is also the mechanism that we use to reclaim space from deleted or updated records in our databases.
If anybody wishes to follow along with this example, here is the script that I used to create my sample database (click to expand). In a nutshell, the DB contains two tables. Into one table we insert a large amount of data in reverse order (so it will be perfectly fragmented, in terms of external fragmentation). Into the other table, we insert a large amount of data in order, and them delete the majority of it so that we have large holes of unused space on every page.
USE [master] GO CREATE DATABASE FragmentTest ON PRIMARY ( NAME = N'Frag_P', FILENAME = N'[some_filepath_here]\Frag_P.mdf' , SIZE = 1GB , MAXSIZE = UNLIMITED, FILEGROWTH = 1GB ), FILEGROUP main DEFAULT( NAME='Frag_D', FILENAME = N'[some_filepath_here]\Frag_D.ndf', SIZE=21GB, MAXSIZE = UNLIMITED, FILEGROWTH = 1GB ) LOG ON( NAME = N'Frag_L', FILENAME = N'[some_filepath_here]<span id="mce_SELREST_start" style="overflow: hidden; line-height: 0;"></span>\FRAG_L.ldf' , SIZE = 8000MB , FILEGROWTH = 8000MB ) GO ALTER DATABASE FragmentTest SET RECOVERY SIMPLE; GO USE FragmentTest GO CREATE SCHEMA frag; GO CREATE TABLE frag.InternalFrag( ID INT IDENTITY NOT NULL, Payload CHAR(996) NOT NULL CONSTRAINT DF_InternalFrag_Payload DEFAULT 'Time to look big', CONSTRAINT PK_InternalFrag PRIMARY KEY CLUSTERED( ID ) ); CREATE TABLE frag.ExternalFrag( ID INT NOT NULL, Payload CHAR(996) NOT NULL CONSTRAINT DF_ExternalFrag_Payload DEFAULT 'Time to look big', CONSTRAINT PK_ExternalFrag PRIMARY KEY CLUSTERED( ID ) ); GO -- Create internal framentation by populating a table with 10GB of data then deleting 7/8 of the data. INSERT INTO frag.InternalFrag(Payload) VALUES('Time to look big'); GO 10485760 DECLARE @first_match INT; DECLARE @last_match INT; WITH x AS(SELECT TOP 100000 ID FROM frag.InternalFrag WHERE ID % 8 > 0) SELECT @first_match = MIN(ID), @last_match = MAX(ID) FROM x; WHILE @first_match IS NOT NULL AND @last_match IS NOT NULL BEGIN DELETE FROM frag.InternalFrag WHERE ID >= @first_match AND ID 0; WITH x AS(SELECT TOP 100000 ID FROM frag.InternalFrag WHERE ID % 8 > 0) SELECT @first_match = MIN(ID), @last_match = MAX(ID) FROM x; END GO -- Insert about 10GB into the external frag table, in reverse order, to generate external fragmentation DECLARE @count INT = 10485760; WHILE @count > 0 BEGIN INSERT INTO frag.ExternalFrag(ID) VALUES(@count); SET @count -= 1; END GO
Lets go ahead and see what fragmentation looks like against these two tables like so
SELECT OBJECT_NAME( ps.object_id ) AS Tab, i.name AS Ind, ps.avg_fragmentation_in_percent, ps.avg_page_space_used_in_percent FROM sys.dm_db_index_physical_stats( DB_ID(), DEFAULT, DEFAULT, DEFAULT, DEFAULT) ps INNER JOIN sys.indexes i ON ps.object_id = i.object_id AND ps.index_id = i.index_id;
As expected, the ExternalFrag table where we intentionally inserted data out of order is extraordinarily fragmented. But what about the InternalFrag table? With fragmentation of less than 1% does that mean we do not need to worry about it? Actually what this really means is that the default “LIMITED” mode of the dm_db_index_physical_stats function doesn’t look at leaf-level pages so we are not getting information on what things look like at the leaf level. The fix is straightforward, we can instead run in the “SAMPLED” mode (or even “DETAILED” if we want to scan the entire leaf level … and have a ton of time to kill).
SELECT OBJECT_NAME( ps.object_id ) AS Tab, i.name AS Ind, ps.avg_fragmentation_in_percent, ps.avg_page_space_used_in_percent FROM sys.dm_db_index_physical_stats( DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'SAMPLED') ps INNER JOIN sys.indexes i ON ps.object_id = i.object_id AND ps.index_id = i.index_id;
This is more like it. We have to wait longer for results, but the result set now tells us how much unused space is sitting in data pages waiting to be reclaimed. Those who are accustomed to looking at avg_fragmentation_in_percent are used to thinking of low numbers as good and high as bad. This is reversed for avg_page_space_used_in_percent – a low percentage of used space could indicate we need to rebuild (unless we intentionally specified a large fill factor). So in this case, almost 88% of the space in the InternalFrag table is not actually getting used. Why does this matter? The obvious reason is that if we aren’t intentionally padding the InternalFrag table then it is taking up about 8 times as much space as it needs to on disk. But also consider what happens when we run the following queries
SET STATISTICS IO ON GO SELECT COUNT(*) FROM frag.InternalFrag; SELECT COUNT(*) FROM frag.ExternalFrag;
On my workstation here are the results
(1 row affected) Table 'InternalFrag'. Scan count 5, logical reads 1321485, physical reads 0, read-ahead reads 1309772, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (1 row affected) Table 'ExternalFrag'. Scan count 5, logical reads 1319382, physical reads 2, read-ahead reads 1314985, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Now what happens if we defragment the indexes like so?
ALTER INDEX PK_InternalFrag ON frag.InternalFrag REORGANIZE; ALTER INDEX PK_ExternalFrag ON frag.ExternalFrag REORGANIZE;
After this, when we re-run the COUNT(*) queries above we get the following message
(1 row affected) Table 'InternalFrag'. Scan count 5, logical reads 174607, physical reads 2, read-ahead reads 170547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (1 row affected) Table 'ExternalFrag'. Scan count 5, logical reads 1319552, physical reads 9, read-ahead reads 1325530, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
That is an 87% reduction in logical reads for the query against InternalFrag. If this were a real wold scenario, that could mean at the end of the day 87% fewer pages that could help satisfy future queries getting unceremoniously dumped from memory if we ever scan this table.
For the sake of completeness, before we leave lets take another look at the results of the dm_db_index_physical_stats query to be absolutely certain that both types of fragmentation have been cleared up.
Internal Fragmentation Is Important
One of my pet peeves on most projects I’ve joined is that the index maintenance jobs are typically checking for high levels of avg_fragmentation_in_percent and ignoring avg_page_space_used_in_percent. Am I saying that we’ve been doing things wrong? It depends. But for smaller DBs … yeah kind of. Checking for excessive amounts of free space inside data pages directly is probably better … because unused space is more of an issue than pages that are merely out of order on many systems nowadays. I’m not saying we should not check for external fragmentation at all though, I actually like to keep an eye on both. Remember that even on solid state storage, external fragmentation still matters a little as we get close to the performance limits on our hardware … and if we say that our servers are I/O bound then that means we probably do push this limit at times (by definition). Assuming that there is enough slow time on the system for a call to dm_db_index_physical_stats in either sampled or detailed mode as well as any desired index rebuild/reorganize operations I like to keep an eye on both.
What If There Isn’t Time?
The abover advice should be fine for the vast majority of folks reading this. But there is an issue with even the sampled mode of dm_db_index_physical_stats – it takes longer. We may not care how long this takes if we are looking at something like AdventureWorks where the difference between limited and sampled modes is measured in seconds or minutes. But for larger datasets, it’s possible that limited mode will take an hour or two (or even longer) while sampled could take several hours to complete. On such systems, there is a chance that there isn’t time to both check for unused space and also do any useful amount of defrag work. One possible fix is simply to just check fragmentation levels less frequently. For example, rather than checking fragmentation every day, maybe it is possible to check fragmentation levels once per week and scatter rebuild operations over slow periods on their other days. Another possibility could be to not spend time on dm_db_index_physical_stats on any recurring basis and to instead develop a fixed schedule for rebuilding indexes that are known to be problematic.
There is another possibility though. My examples that show external fragmentation without internal, and more importantly internal fragmentation without external are a bit contrived. In real world situations it is normal for the two kinds of fragmentation to be seen together. Workloads vary so do your own testing … but for most real-world situations external and internal fragmentation tend to be closely correlated. Tables that have high levels of internal fragmentation are going to tend to also have significant external fragmentation. I would like to reiterate – whenever practicable I am a huge advocate for tracking internal fragmentation directly. That said, if the amount of time needed for sampled mode is an issue, odds are that rebuilding externally fragmented tables will keep internal fragmentation in check. Just be sure to consider running an extra rebuild or reorg after doing massive deletes or updates.