This is the fourth article in a series entitled “A Practical Methodology for Query Optimization”. Each article builds on what we learned in earlier articles. If you have not read the previous articles, I highly recommend starting with the first article.
Last month I cheated a bit and pointed you at Grant Fritchey’s ebook on graphic execution plans instead of writing a lengthy article myself. I don’t expect that a lot of you have finished the book, but hopefully you have learned the basics about interpreting graphical query plans and can use that knowledge examine the query plans for our examples this month.
This month we discuss search arguments and what you need to know about them to write efficient TSQL code. We will again use the AdventureWorks sample database for our examples. (If you don’t have the Adventureworks database, get it here.)
If you plan to follow along, you should prepare your query window. Execute “SET STATISTICS IO ON” in the query window and turn on “Show Actual Query Plan”. If you have not done so already, read the first article in this series for a fuller explanation.
About the Examples
I spend quite a bit of time trying to create example queries that illustrate a point but also have some common business purpose. However, this month I seem to have failed on the second point. The examples are a bit contrived and might leave you asking “why would anyone want to do that anyway?”
Try to ignore that and just concentrate on the concept we are trying to illustrate.
What is a Search Argument?
Search arguments (often referred to as SARGs) are comparisons usually placed in the WHERE and JOIN clauses of a query to limit the rows returned by the query. Multiple arguments separated by the “AND” keyword can also be considered a search argument, but our examples will all be single expressions for simplicity.
A Searchable Argument
For example in the query below, ‘CarrierTrackingNumber = ‘F467-41BF-8B‘ is a search argument. The join conditions can also be considered search arguments but in this article we will concentrate on search arguments in the WHERE clause.
- USE AdventureWorks;
- SELECT h.SalesOrderID, p.Name, d.LineTotal
- FROM Sales.SalesOrderHeader h
- JOIN Sales.SalesOrderDetail d on d.SalesOrderID =h.SalesOrderID
- JOIN Production.Product p on d.ProductID = p.ProductID
- WHERE d.CarrierTrackingNumber = ‘F467-41BF-8B’
This argument is searchable because it is an equality comparison of a column to a literal value. An index can be used to filter out unwanted data from this query and thereby reduce the number of pages read.
However, in this case, even though the argument is searchable, if you execute it in the AdventureWorks database, the optimizer will choose a full scan of the clustered index on SalesOrderDetail at a cost of 1240 page reads.
This is because there is no appropriate index on the column CarrierTrackingNumber. It is a simple but often overlooked fact that a searchable argument will do you no good if there is not a relevant index on the column. Do not assume the index is there. Go and look.
We create a nonclustered index on the CarrierTrackingNumber column as below.
CREATE NONCLUSTERED INDEX [nc_trackingnum] ON [Sales].[SalesOrderDetail] ([CarrierTrackingNumber])
Now when we execute the query we find that the optimizer performs a seek on that index and the cost is reduced from 1240 to 93 page reads (on larger tables, the improvement might be much greater). This is what a search argument is supposed to do.
Non-searchable Arguments
For the purposes of this article, unsearchable arguments may be of more interest to us because in the case of search arguments, it is most important to learn what not to do.
To create an unsearchable argument, we modify our query to cast the nvarchar(25) CarrierTrackingNumber to a char(12) datatype. The argument becomes unsearchable and results in a full scan of SalesOrderDetail instead of the index seek.
- SELECT h.SalesOrderID, p.Name, d.LineTotal
- FROM Sales.SalesOrderHeader h
- JOIN Sales.SalesOrderDetail d on d.SalesOrderID =h.SalesOrderID
- JOIN Production.Product p on d.ProductID = p.ProductID
- WHERE CAST(carriertrackingnumber as CHAR(12)) = ‘F467-41BF-8B’
Even though the index on CarrierTrackingNumber still exists, it cannot be used because of the function call. If you look at the graphical execution plan you can confirm that a full scan of the clustered index on SalesOrderDetail is performed instead.
Why Can’t SQL Server Use the Index?
It may be obvious to the human mind that this function will return a string that is essentially the same as the original for comparison purposes. However, SQL Server has no intuition. Everything is zeros and ones to the query optimizer. The query engine must execute the function for every row before it knows which results are equal to ‘F467-41BF-8B’.
This example is generally applicable to any search argument that contains a function call on the right side of the comparison operator. isnull(), upper(), lower(), date functions,etc are common problems in query performance.
However, there are exceptions. For example if we compare the result of the getdate() function to a date column, the optimizer will probably convert the return value of the function to a constant expression and use that in the comparison rather than run the function for every row. In that case, an index on the date column might be used.
Date arithmetic functions are particularly problematic. Datediff() and Dateadd() used in a search argument can cause table scans in some circumstances. In other circumstances the optimizer can create a constant from the function call and an index on the date column might be used.
It is usually best to calculate the date arithmetic outside the query and save the result into a datetime variable before executing the query. The comparison of the variable to a column value will normally use an index, but depending on the details of the comparison, it may be impossible to remove the function from the search argument.
You should avoid functions in a search argument if possible, but if you have to use a function in your search arguments, you should get in the habit of looking at the execution plan to see how the optimizer is treating it.
Unsearchable Operators
An index can tell you what something is, but it can’t tell you what it is not. There are some operators that are intrinsically unsearchable, for example the operators <>, NOT EXISTS, NOT IN and NOT LIKE.
Let’s change the equality comparison operator (=) in our original query to the not equal operator (<>) and see what happens.
- SELECT h.SalesOrderID, p.Name, d.LineTotal
- FROM Sales.SalesOrderHeader h
- JOIN Sales.SalesOrderDetail d on d.SalesOrderID =h.SalesOrderID
- JOIN Production.Product p on d.ProductID = p.ProductID
- WHERE d.CarrierTrackingNumber <> ‘F467-41BF-8B’
If you check the io on this query, or the execution plan, you will find that the change of operator has caused a full clustered index scan on SalesOrderDetail again even though there is still an index on CarrierTrackingNumber.
Negative operators are useful for filtering out unwanted data but do not expect that they can be used to optimize a query.
Conclusion
There are other reasons why a seemingly searchable argument might not be used by the optimizer. It is not always a case of searchability or non-searchability. An argument may not use an index because the argument is not sufficiently selective. It may not be searched because there is another better search argument to drive the query or for many other reasons.