A few days ago I went to the SQL PASS group meeting to hear Vern Rabe talk about the perils of NOLOCK. Vern is a veteran SQL guru. After 20+ years he still spends a lot of his time in airplanes going around the world to speak to SQL groups on various subjects.
If you get a chance to see Vern, I would do it. (He will be presenting at the PASS Summit this year, as in most years.)
While I have always argued with my clients about their excessive use of NOLOCK and the consequent dirty reads, Vern refreshed my memory on all the other awful things that can happen when you use NOLOCK hints carelessly. A url to his presentation and example code is at the bottom of this article. (no cheating. You should read the entire article first. That’s why I put the url at the bottom).
Everybody’s Doing It
Whether they admit it or not, most SQL Server developers use NOLOCK hints at times to control blocking. Most of them understand that the use of NOLOCK can introduce inaccurate data into the result sets by allowing Dirty Reads i.e. returning data that is not yet committed and may never be committed to the database. In that case the result set would contain data that does not exist in the database.
There are situations where a dirty read is allowable. Most conscientious developers will only use NOLOCK in situations where they are certain that dirty reads do not matter. These cautious developers generally don’t like using NOLOCK hints, but they do not know of a way to combat blocking without using them. They do what they have to do to get acceptable performance from the database. (We will discuss alternatives later in this article).
…and then there are the developers who routinely put NOLOCK hints in every query they write. Don’t be one of them.
In my experience I have found relatively few developers who know of, or consider other conditions that can cause inaccurate results when NOLOCK is used.
You probably know about Dirty Reads, but do you know about these other sources of inaccurate results that need to be considered before you use NOLOCK?
Non Repeatable Reads and NOLOCK
If your procedure reads the same data multiple times and depends on getting the same result set each time, you are vulnerable to writing processes running in other transactions that are able to change that same data while your procedure is executing. This can cause your code to go south when it receives different data from the same query. You may also find that you have rows missing from the final result set.
Duplicates and NOLOCK
If a your procedure is scanning a table using NOLOCK and another process begins to insert rows into the same table, duplicate records can be produced under a number of conditions that cause data to be moved from one page to another. (page splits, for example). This can cause rows to be read more than once and to put duplicates in the result set.
If you often see 2601 errors (duplicate key violations) in your traces, this could be the cause.
Phantom Reads and NOLOCK
Phantom reads are similar to non repeatable reads. They can occur when your procedure first retrieves the data it is going to process (and, for example, sticks it in a temp table) while another transaction changes that same data in the database. Your code will attempt to process the data as it was when it was first queried and that usually leads to trouble with innacurate results.
Blob Data and NOLOCK
If your database performs operations using NOLOCK when retrieving Binary Large Object data you may encounter SQL error 7886. Unlike many SQL error messages, this one is pretty clear:
A read operation on a large object failed while sending data to the client. A common cause for this is if the application is running in READ UNCOMMITTED isolation level. This connection will be terminated.
An Alternative to NOLOCK
I have campaigned against NOLOCK with almost every client I have worked with, but I have to admit I have not been successful in changing many minds. Their argument is a good one: NOLOCK is the only way they have found to make the application perform adequately.
But there are viable alternatives to NOLOCK. My first recommendation is always to optimize the code that is causing the blocking. But often that is not practical and sometimes not possible. At any rate, they have probably been trying to do that for years.
My second and favorite alternative is to enable Read Committed Snapshot Isolation (RCSI). A full description of RCSI is not in the scope of this article but suffice it to say that RCSI changes the isolation mode of the database in a way that, like NOLOCK, eliminates shared locks without bringing with it the bad things that come with NOLOCK.
Unfortunately, not many of my clients are ready to change something as fundamental as the isolation mode of their database. It sounds like a monumental and risky task to them, but it is actually quite simple to do and undo. I think of RCSI as something like a miracle for databases with big blocking problems.
The downside to RCSI is that it will increase the activity in your tempdb. Also, when you enable RCSI you should remove the NOLOCK hints that are already in your code. They won’t throw errors but if they conflict, NOLOCK trumps RCSI so you are still vulnerable to dirty reads and all the bad things that you are trying to avoid.
Deprecation of NOLOCK
Using NOLOCK\READ_UNCOMMITED has been partially deprecated in SQL Server, meaning that it will be removed from SQL Server in some future version of SQL. By partial I mean that only certain functions are deprecated at this time. However it is a good bet that the entirety of READ_UNCOMMITED will be removed eventually. Here is the text of the current deprecation notice.
“Support for use of the READUNCOMMITTED and NOLOCK hints in the FROM clause that apply to the target table of an UPDATE or DELETE statement will be removed in a future version of SQL Server. Avoid using these hints in this context in new development work, and plan to modify applications that currently use them
This first deprecation should not hit you too hard because it is a no-op. NOLOCK does nothing when it is used in UPDATE or DELETE queries. (any time data is changed an exclusive lock has to be taken. period.)