Those who are new to SQL Server tuning are often puzzled why the query optimizer in SQL Server would prefer to scan an entire table instead of using a non clustered index that seems to fit the query perfectly. For example, let’s take this query on a table in the AdventureWorks database:
Select LineTotal
from sales.SalesOrderDetail
where ProductID = 870
Following the Example
If you have any version of the Adventureworks sample database you can follow along. Click here to download the sample databases. Be sure to enable ‘Include Actual Execution Plan’ from the toolbar in Management Studio before you execute the query. That will allow you to see the execution plan chosen by the optimizer.
Also, type ‘SET STATISTICS IO ON’ in the query window. Execute that command in its own batch. Your screen should look like this:
USE AdventureWorks2008R2
SET STATISTICS IO ON
GO
select LineTotal
from sales.SalesOrderDetail
where ProductID = 870
(You may need to change the database name to the version of the database you are using.)
This statement will cause Management studio to display the query cost (in logical page reads) in the Messages tab of the results pane.
If you are unfamiliar with setting either of these options, you might want to read my previous article A Practical Methodology for Query Tuning. It will also tell you why I consider logical page reads to be the best metric to use to measure query cost.
Statistics
Almost everything associated with SQL Server tuning eventually comes down to statistics. We will discuss these stats later in this article so keep them in mind.
The table in our example query contains 121,317 rows of data contained in 1,240 8 kb data pages. It has a nonclustered index on Productid and a clustered index on SalesOrderID, SalesOrderDetailID. With the value chosen for the ProductID filter it returns 4,688 rows or about 4% of the total number of rows in the table.
At first glance, the execution plan looks obvious. The query engine should execute a seek on the ProductID index.
However, when we excecute the query and examine the execution plan tab in the results pane, we find that instead of doing an index seek on the ProductID index as we expected, the optimizer prefers a full scan of the clustered index. It prefers to read all the data contained in the table rather than just the 4% of the rows where the productid equals 870 .
When we look at the Messages tab we can confirm that the query reads all 1,240 data pages in the table.
Why would the optimizer make a choice like that? Is it a mistake?
It is not a mistake. That is the optimum query plan. If you don’t believe that, change the query slightly to force the use of the ProductID index.
select LineTotal
from sales.SalesOrderDetail with
(index=IX_SalesOrderDetail_ProductID)
where ProductID = 870
After you execute this query, look again at the Messages tab. You will see that SQL Server had to read 14,377 data pages to return the same result. Forcing the “right” index was roughly 10 times more expensive than the clustered index scan chosen by the optimizer.
Understanding the Tipping Point
The key to this mystery is to understand a crucially important difference between clustered and non clustered indexes.
Most DBA’s and SQL developers know that the clustered index contains the data itself in the leaf level, while non clustered indexes contain pointers to the pages containing the data. However, based on my experiences in the field it seems to me that important implications of that difference are not widely understood.
At the end of a seek or scan on a clustered index, the data has been retrieved and results can be sent back to the application. At the end of a seek or scan on a non clustered index, the query engine has only pointers to the actual data. The data itself has yet to be returned.
With a non clustered index the data is retrieved one row at a time with a series of single-page io’s called row id lookups or RID lookups. When I say ‘one row at a time’ above. I mean it. A page returned by RID lookup might contain several qualifying rows. But if so, it will be read several times, once for each qualifying row.
The bottom line is that a non clustered index seek will normally use at least as many page reads as the number of rows returned and,depending on various other conditions, perhaps many more.
The Tipping Point
the point at which the number of page reads required by the RID lookups exceeds the total number of data pages in the table, a clustered index scan becomes less expensive than the non clustered index seek. This point is called the Tipping Point. When the tipping point is exceeded, the optimizer will usually prefer a scan of the clustered index instead of a nonclustered index seek.
Covering Indexes
It needs to be mentioned that covering indexes do not use RID lookups so they do not have this problem. This is the reason that they can be lifesavers in some situations. Covering indexes contain all the data required by a specific query or set of queries. Data does not need to be retrieved from the base table because all of the necessary data is contained in the index itself.
If you are not familiar with covering indexes you might want to look at another article in our collection, The Covering Index.
Where is the Tipping Point Exactly?
The tipping point is different for every table. It is generally expressed as a percentage of the total number of rows in the table. The key elements are the number of table rows that will fit on an 8kb data page and the number of rows returned by the query. The more rows that will fit on a page and\or the larger the number of rows returned, the sooner the tipping point is reached.
In our example the rows are very small and almost a hundred rows fit on a data page (total rows/total pages). That means that a clustered index scan on SalesOrderDetail can retrieve 100 rows with each page read while a RID lookup retrieves only one row per page read no matter what the size of the row.
However, if the rows in the example table were wider, fewer of them would fit on a page. That would change the ratio in favor of the non clustered index seek. The number of rows returned by the query also plays into the tipping point. Fewer rows returned by the query reduces the number of RID lookups required and makes it more likely that the non clustered index seek will be used.
Imagine that the rows of this table were so large that only two of them fit on a page (this is not too unusual), that would mean it would take over 60,000 data pages to hold this table. In such a case, the tipping point might be up around 50%. That is, your query might be able to return 50% of the rows in the table without triggering a clustered index scan.
Other Considerations Affecting the Tipping Point
There are a number of other things that affect the tipping point besides the cost of the RID lookups, although that is certainly the most important factor. For one thing, physical IO can be much more efficient when scanning a clustered index. Clustered index data is placed sequentially on the disk in index order, there is very little lateral head travel on the disk, which can improve io performance.
Also, the database engine knows that when it is scanning a clustered index, it is very likely that the next few pages will still contain data it needs. So it starts reading ahead in 64kb chunks instead of the normal 8kb pages. This also results in faster io.
For these reasons the optimizer may prefer the clustered index scan even when the page counts are similar between the clustered and non clustered indexes.
Other issues like index fragmentation, stale statistics, etc. can play into the optimizer’s decision about the tipping point. Since we do not have the same information that the optimizer does, it is difficult to predict exactly where the tipping point will turn out to be for a given table at a given time.
Conclusion
When you work with SQL Server, very often you will see decisions made by the optimizer that make no sense to you. After nearly 20 years at it, that still happens to me. It is a temptation to think of these decisions as bugs or mistakes by the database engine.
Sometimes that is true. The optimizer is not perfect, but in most cases I have found that the problem was my ignorance of the details of an extremely complex process. Mistake by mistake I have learned enough to give the query optimizer the benefit of the doubt when I don’t understand why it is working the way it is.
The tipping point is a good example of this. Taking a quick look at our sample query, it seemed obvious that using the non clustered index was the right way to execute the query. However, with a little more knowledge we now have a better understanding of the complexity of the process behind the optimizer’s decision.
If there is one thing I have learned from many years of working with SQL Server, it is this:
Correct logic applied to incomplete knowledge often leads to incorrect conclusions. Beware of applying a simplistic mental model to a complex phenomenon that you don’t completely understand.