There are DBA’s, SQL bloggers and writers who seem to know everything about almost every aspect of SQL Server. It is a mystery to me how they can keep up with all the new features Microsoft keeps throwing at us. So far as I can tell, the answer is that they work their butts off.
Slacker DBA’s like me who spend at least some part of our lives outside of geekdom can only wonder how these dedicated people find the time. It may be a question of a work ethic which I do not share. SQL server is my living, but it is not my life.
However, I have no shame about taking advantage of the fruits of their incessant labor. After all, they must be doing it for my benefit. Why else would they put it on the internet? Being a writer myself, I can tell you they don’t do it for the money. You can make more money frying burgers at McDonalds than you can writing SQL Server articles.
As a slacker DBA always looking for the easy way to do things, I am entranced by Extended Events (XE) and especially by the system_health XE session which promises better performance data and less work than legacy methods. The only problem with XE is that it is a bit difficult to get your head around it. Thankfully we have a hard working guy like Jonathan Kehayias, the undisputed king of XE, to turn to. In an earlier newsletter I put a link to his 31 blog opus on XE for the serious dba.
But afterwards I thought “Thirty-One blog entries? (and they are long entries) How many of my readers have the time?”. So this month I am just going to give you some queries that will return useful information from system_health without you doing a thing.
Since the system_health session is already running on your sql server, you don’t need to know how to create XE sessions (although you may become motivated to learn more once you see how valuable XE data can be). After reading this article you won’t know much more about XE than you know now, but you will be able to use the performance information constantly being accumulated by your system_health session.
About System_Health
What do you need to know about what’s happening on your SQL server? Deadlocks, severe errors, blocking, non-yielding processes?. In the past you might have set resource-stealing trace flags, run SQL Profiler for a few hours, or spent a couple thousand dollars for a performance monitor that is hammering your server’s resources 24/7. Now you have system_health.
The beauty of system_health is not only that is already created and running on your system, but also that it is not putting much load on your system. The engineers at Microsoft designed this session to return the most useful data that could be gathered with a minimum amount of resource usage.
Ordinarily you would have to know something about XPATH queries to get at this data, but with the queries already written, all you need to know is how to click the execute button.
Limitations
System_health data is held in memory in a ring buffer 4 megabytes in size. The buffer uses a first-in\first-out algorithm so it always contains the most recent 4 mb of performance data. Since the data is efficiently compressed, the buffer contains a surprisingly large amount of data. On a lightly used server the range of data might span months. On a very busy server it can be a much smaller range of data. It is difficult to know the actual date range of the data. However, the individual events have a timestamp that tells you when they occurred.
Using a memory buffer also means that if your system crashes, all the data in system_health that might have helped you diagnose the problem is gone. There are scripts out there that duplicate the system_health session but write the data to a file that will survive a SQL Server crash. However this is beyond the scope of this article
OK, here we go with useful XE queries
Severe Errors
System_health captures all errors of severity 20 and above, plus a half dozen or so other errors you would want to know about. Surprisingly enough, I had to do a little work myself because none of the scripts I found on the internet returned exactly what I wanted. So, standing on the shoulders of giants, I added my little bit to the closest query I could find.
The problem with the scripts I found was that they did not return the SQL code that was throwing the error. It seemed of little use to me to know that a specific error occurred 87 times without knowing what code was causing it. Here is my modification that gives you the sql code as well as the other error data:
- DECLARE @target_data XML
- SELECT @target_data = CAST(target_data AS XML)
- FROM sys.dm_xe_sessions AS s
- JOIN sys.dm_xe_session_targets AS t
- ON t.event_session_address = s.address
- WHERE s.name = ‘system_health’
- SELECT n.value(‘@name’, ‘varchar(50)’) AS EventName
- ,n.value(‘@timestamp’, ‘datetime’) AS DateAndTime
- ,n.value(‘(data[@name=”error”]/value)[1]’, ‘int’) AS ErrNo
- ,n.value(‘(data[@name=”severity”]/value)[1]’, ‘int’) AS Severity
- ,n.value(‘(data[@name=”message”]/value)[1]’, ‘varchar(max)’) AS ErrMsg
- ,n.value(‘(action[@name=”sql_text”]/value)[1]’, ‘varchar(max)’) AS sql_text
- FROM @target_data.nodes(‘//RingBufferTarget/event’) AS q(n)
- WHERE n.value(‘@name’, ‘varchar(50)’) = ‘error_reported’
Identify The Cause of Serious Wait States
It is relatively easy to identify wait states that accumulate a huge amount of wait time. I found some XE scripts that do it. However, I would like to see what SQL code is causing these waits and do not want to see a lot of waits that have null durations. Again I had to customize a script. Here is the query:
- declare @sys_health xml;
- set @sys_health = ( select top 1 cast(t.target_data as xml)
- from sys.dm_xe_session_targets t
- join sys.dm_xe_sessions s
- on t.event_session_address = s.address
- where s.name = ‘system_health’ )
- select
- n.value(‘(@name)’, ‘nvarchar(50)’) as [event name],
- n.value(‘(data[@name=”wait_type”]/text)[1]’, ‘nvarchar(50)’) as wait_type,
- n.value(‘(data[@name=”duration”]/value)[1]’, ‘bigint’) as duration,
- n.value(‘(action[@name=”sql_text”]/value)[1]’, ‘nvarchar(4000)’) as stmt_text
- from @sys_health.nodes(‘/RingBufferTarget/event’) as q(n)
- where n.value(‘(data[@name=”duration”]/value)[1]’, ‘bigint’) is not null
Deadlocks
If deadlocks are occurring on your SQL Server, you need to know about them and especially the code that is deadlocking. I chose a query that comes straight from the hands of the master, Jonathon K. I chose this one because it is the only one I found that works on all platforms. Mr. K worked around a bug in the RTM version of SQL 2008 to produce this relatively long script.
The format of this newsletter makes it difficult to select and paste a long, complex query like this so you should get the code HERE.
Summary and Count of Events
While the previous queries were fairly specific, sometimes you just want a general look at what is happening on your server. This query will show you all the different events that have occurred on the server recently with a count of how many times they occurred.
- SET NOCOUNT ON
- IF (SUBSTRING(CAST(SERVERPROPERTY (‘ProductVersion’) AS varchar(50)),1,CHARINDEX(‘.’,CAST(SERVERPROPERTY (‘ProductVersion’) AS varchar(50)))-1) = 10)
- BEGIN
- DECLARE @UTDDateDiff int
- SET @UTDDateDiff = DATEDIFF(mi,GETUTCDATE(),GETDATE())
- SELECT CAST(xet.target_data as xml) as XMLDATA
- INTO #SystemHealthSessionData
- FROM sys.dm_xe_session_targets xet
- JOIN sys.dm_xe_sessions xe
- ON (xe.address = xet.event_session_address)
- WHERE xe.name = ‘system_health’
- ;WITH CTE_HealthSession AS
- (
- SELECT C.query(‘.’).value(‘(/event/@name)[1]’, ‘varchar(255)’) as EventName,
- C.query(‘.’).value(‘(/event/@timestamp)[1]’, ‘datetime’) as EventTime
- FROM #SystemHealthSessionData a
- CROSS APPLY a.XMLDATA.nodes(‘/RingBufferTarget/event’) as T(C))
- SELECT EventName,
- COUNT(*) as Occurrences,
- DATEADD(mi,@UTDDateDiff,MAX(EventTime)) as LastReportedEventTime,
- DATEADD(mi,@UTDDateDiff,MIN(EventTime)) as OldestRecordedEventTime
- FROM CTE_HealthSession
- GROUP BY EventName
- ORDER BY 2 DESC
- DROP TABLE #SystemHealthSessionData
- END
Take the Ball and Run With It
The queries above just show you the tip of the iceberg. There is a lot more useful data in system_health that can be queried. You might first consider searching the internet for queries that will return the data you need. But sooner or later you will have to create your own XE sessions and write your own queries to retrieve the data you need. However, after using these queries and modifying them here and there, the thought of creating and querying your own XE sessions will not seem so daunting.