Dessert first a.k.a. how useful would this be?
Imagine running a query and finding out all the bad queries that have been submitted. Like this
The client restored a database but neglected to fix permissions for those first three. Look at that fourth one though! I only enabled this trace on the 3rd. Any takers on how many months that error has been swallowed by their application framework? Exactly. This is definitely something I'm adding into my utility belt. Read on for how to set up Extended Events to do just this.
We now resume your regularly scheduled blog post, already in progress.
Finally, an extended event that was useful.
I'm sure that's heresy among the pure DBAs but I'm not a DBA. I'm just a developer, among many other things. At my last client, one of the developers was hell bent on building procedures which called procedures that built code to call more procedures - dynamic code to build statements to an untenable nesting level. Which was cool in theory but unfortunately they had no way to debug it when it built bad queries. Which it did. Over and over again. Thousands per second. It could have been marketed as Errors as a Service. The real problem though was that their bad code took several hours to for them to recreate the error scenario. What we needed was a way to log the statements across all those levels of nested stored procedures and the values they sent.
Enter Kendra Little and this "magic" script. It's magic because it was the missing link for me finally "getting" XEvents. I remember asking at SQL Saturday #91 (Omaha) back in 2011 "where would I, as a developer, use extended events?" Four years later, I had a scenario where I needed it.
I had meant to blog about this in April when I first used it but then shiny squirrels and this was lost in the TODO list. Fast forward to today. At my current client, we've built a new website and the database to manage part of their business. Most of the website runs off stored procedures but part of it is SQL embedded in the web pages. As much as the developer would like to just make the account the database owner, I'd rather not give up on security. I created a custom database role and added the service account into and granted it execute permissions to the schema. All is well and good except for those pesky direct queries we couldn't turn into procedures. When those queries hit, the "database is down." At least, that's the depth of the email I get. You know what would be handy here, that same bad code tracker!
Implementation
My implementation is to create an extended event that logs to an asynchronous file (first, do no harm). Where my approach differs from Kendra's is parsing all those files is dog slow - at least for the Errors As A Service scenario. So instead, I create a heap to hold the parsed bits I care about as well as the raw XML from the extended event. That way if I determine I wanted some other attribute extracted, I'd still have the original data available. Oh, and finally, I clean up after myself which is important when you're a consultant. You might roll off an engagement and someone forgets to watch the drive and suddenly, you've filled a drive with log data.
You'll notice I create the table in tempdb. Basically, I don't care if I lose that table, it's just for my monitoring purposes and in fact, putting it in temp assures me that at some point, it'll get wiped and recreated. If I were blessed to be on a 2014+ platform, I'd look hard at whether a memory optimized table might serve the same need as long as it's declared with a durability of SCHEMA_ONLY. I don't think LOB types are supported there yet, but I suspect it's a matter of time before they are.
This idempotent script will create our table, dbo.ExtendedEventErrorCapture, if it does not exist. It will create an extended event session, what_queries_are_failing, where we capture any error above severity 10 and stream it out to the file on F:\XEventSessions. If you don't have an F drive with that folder, you need to either update the script or talk to your storage vendor as they will happily supply one. Finally, we turn on the trace/session.
USE tempdb;
--------------------------------------------------------------------------------
-- Create the table to store out the XE data
--------------------------------------------------------------------------------
IF NOT EXISTS
(
SELECT
*
FROM
sys.schemas AS S
INNER JOIN
sys.tables AS T
ON T.schema_id = S.schema_id
WHERE
T.name = 'ExtendedEventErrorCapture'
AND S.name = 'dbo'
)
BEGIN
CREATE TABLE
dbo.ExtendedEventErrorCapture
(
err_timestamp datetime2(7) NULL
, err_severity bigint NULL
, err_number bigint NULL
, err_message nvarchar(512) NULL
, sql_text nvarchar(MAX) NULL
, event_data xml NULL
, session_name sysname
, username nvarchar(512)
)
-- This is only vaild for a Develper/Enterprise edition license
WITH (DATA_COMPRESSION = PAGE);
END
--------------------------------------------------------------------------------
-- Create the extended event to keep track of bad sql queries
--------------------------------------------------------------------------------
IF NOT EXISTS
(
SELECT
*
FROM
sys.server_event_sessions AS SES
WHERE
SES.name = N'what_queries_are_failing'
)
BEGIN
--Create an extended event session
CREATE EVENT SESSION
what_queries_are_failing
ON SERVER
ADD EVENT sqlserver.error_reported
(
ACTION (sqlserver.sql_text, sqlserver.tsql_stack, sqlserver.database_id, sqlserver.username)
WHERE ([severity]> 10)
)
ADD TARGET package0.asynchronous_file_target
(set filename = 'F:\XEventSessions\what_queries_are_failing.xel' ,
metadatafile = 'F:\XEventSessions\what_queries_are_failing.xem',
max_file_size = 512,
increment = 16,
max_rollover_files = 5)
WITH (MAX_DISPATCH_LATENCY = 5SECONDS);
END
--------------------------------------------------------------------------------
-- Turn on the extended event
--------------------------------------------------------------------------------
IF NOT EXISTS
(
-- When a XE is active, then there is an entry
-- in sys.dm_xe_sessions
SELECT
*
FROM
sys.dm_xe_sessions AS DXS
INNER JOIN
sys.server_event_sessions AS SES
ON SES.name = DXS.name
WHERE
SES.name = N'what_queries_are_failing'
)
BEGIN
-- Start the session
ALTER EVENT SESSION what_queries_are_failing
ON SERVER STATE = START;
END
--------------------------------------------------------------------------------
-- Wait for errors
-- PROFIT!
--------------------------------------------------------------------------------
That's not so bad. Sure, it looks like a lot of code but you've got 1-2 lines of code to change (page compression and where to log to).
Where's the beef?
Once you think you've captured enough data, I turn off the session and parse the files into tables. As I am not a demon robot, I am grateful to Kendra for writing the XQuery for me to parse the meaningful data out of the trace files. I am a developer, but not
that kind of developer.
USE tempdb;
--------------------------------------------------------------------------------
-- Turn off our extended event
--------------------------------------------------------------------------------
IF EXISTS
(
-- When a XE is active, then there is an entry
-- in sys.dm_xe_sessions
SELECT
*
FROM
sys.dm_xe_sessions AS DXS
INNER JOIN
sys.server_event_sessions AS SES
ON SES.name = DXS.name
WHERE
SES.name = N'what_queries_are_failing'
)
BEGIN
-- Start the session
ALTER EVENT SESSION what_queries_are_failing
ON SERVER STATE = STOP;
END
--------------------------------------------------------------------------------
-- Extract data from our XE
--------------------------------------------------------------------------------
;
WITH events_cte AS
(
SELECT
DATEADD(mi,
DATEDIFF(mi, GETUTCDATE(), CURRENT_TIMESTAMP),
xevents.event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [err_timestamp],
xevents.event_data.value('(event/data[@name="severity"]/value)[1]', 'bigint') AS [err_severity],
xevents.event_data.value('(event/data[@name="error_number"]/value)[1]', 'bigint') AS [err_number],
xevents.event_data.value('(event/data[@name="message"]/value)[1]', 'nvarchar(512)') AS [err_message],
xevents.event_data.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(max)') AS [sql_text],
xevents.event_data,
xevents.event_data.value('(event/action[@name="username"]/value)[1]', 'nvarchar(512)') AS [username],
'what_queries_are_failing' AS session_name
FROM sys.fn_xe_file_target_read_file
(
'F:\XEventSessions\what_queries_are_failing*.xel'
, 'F:\XEventSessions\what_queries_are_failing*.xem'
, NULL
, NULL) AS fxe
CROSS APPLY (SELECT CAST(event_data as XML) AS event_data) AS xevents
)
INSERT INTO
dbo.ExtendedEventErrorCapture
(
err_timestamp
, err_severity
, err_number
, err_message
, sql_text
, event_data
, session_name
, username
)
SELECT
E.err_timestamp
, E.err_severity
, E.err_number
, E.err_message
, E.sql_text
, E.event_data
, E.session_name
, E.username
FROM
events_cte AS E;
Mischief managed
Clean up, clean up, everybody clean up! As our final step, since we're good citizens of this database and we remove our extended event and once that's done, we'll leverage xp_cmdshell to delete the tracefiles.
USE tempdb;
--------------------------------------------------------------------------------
-- Get rid our extended event
--------------------------------------------------------------------------------
IF EXISTS
(
SELECT
*
FROM
sys.server_event_sessions AS SES
WHERE
SES.name = N'what_queries_are_failing'
)
BEGIN
-- Clean up your session from the server
DROP EVENT SESSION what_queries_are_failing ON SERVER;
END
--------------------------------------------------------------------------------
-- Get rid our extended event files only if the XE is turned off
-- or no longer exists
--------------------------------------------------------------------------------
IF NOT EXISTS
(
SELECT
1
FROM
sys.dm_xe_sessions AS DXS
INNER JOIN
sys.server_event_sessions AS SES
ON SES.name = DXS.name
WHERE
SES.name = N'what_queries_are_failing'
UNION ALL
SELECT
1
FROM
sys.server_event_sessions AS SES
WHERE
SES.name = N'what_queries_are_failing'
)
BEGIN
-- Assumes you've turned on xp_cmdshell
EXECUTE sys.xp_cmdshell'del F:\XEventSessions\what_queries_are_failing*.xe*';
END
Now what?
Query our table and find out what's going on. I use a query like the following to help me identify what errors we're getting, whether they're transient, etc.
SELECT
EEEC.err_message
, MIN(EEEC.err_timestamp) AS FirstSeen
, MAX(EEEC.err_timestamp) AS LastSeen
, COUNT_BIG(EEEC.err_timestamp) AS ErrorCount
FROM
tempdb.dbo.ExtendedEventErrorCapture AS EEEC
GROUP BY
EEEC.err_message
ORDER BY
COUNT_BIG(1) DESC;
You've already seen the results but I've very happy to have found a use case for XE in my domain. In the next post, I'll wrap this implementation up with a nice little bow. Or a SQL Agent job. Definitely one of the two.