Dessert first a.k.a. how useful would this be?
Imagine running a query and finding out all the bad queries that have been submitted. Like this
The client restored a database but neglected to fix permissions for those first three. Look at that fourth one though! I only enabled this trace on the 3rd. Any takers on how many months that error has been swallowed by their application framework? Exactly. This is definitely something I'm adding into my utility belt. Read on for how to set up Extended Events to do just this.
We now resume your regularly scheduled blog post, already in progress.
Finally, an extended event that was useful.
I'm sure that's heresy among the pure DBAs but I'm not a DBA. I'm just a developer, among many other things. At my last client, one of the developers was hell bent on building procedures which called procedures that built code to call more procedures - dynamic code to build statements to an untenable nesting level. Which was cool in theory but unfortunately they had no way to debug it when it built bad queries. Which it did. Over and over again. Thousands per second. It could have been marketed as Errors as a Service. The real problem though was that their bad code took several hours to for them to recreate the error scenario. What we needed was a way to log the statements across all those levels of nested stored procedures and the values they sent.Enter Kendra Little and this "magic" script. It's magic because it was the missing link for me finally "getting" XEvents. I remember asking at SQL Saturday #91 (Omaha) back in 2011 "where would I, as a developer, use extended events?" Four years later, I had a scenario where I needed it.
I had meant to blog about this in April when I first used it but then shiny squirrels and this was lost in the TODO list. Fast forward to today. At my current client, we've built a new website and the database to manage part of their business. Most of the website runs off stored procedures but part of it is SQL embedded in the web pages. As much as the developer would like to just make the account the database owner, I'd rather not give up on security. I created a custom database role and added the service account into and granted it execute permissions to the schema. All is well and good except for those pesky direct queries we couldn't turn into procedures. When those queries hit, the "database is down." At least, that's the depth of the email I get. You know what would be handy here, that same bad code tracker!
Implementation
My implementation is to create an extended event that logs to an asynchronous file (first, do no harm). Where my approach differs from Kendra's is parsing all those files is dog slow - at least for the Errors As A Service scenario. So instead, I create a heap to hold the parsed bits I care about as well as the raw XML from the extended event. That way if I determine I wanted some other attribute extracted, I'd still have the original data available. Oh, and finally, I clean up after myself which is important when you're a consultant. You might roll off an engagement and someone forgets to watch the drive and suddenly, you've filled a drive with log data.
You'll notice I create the table in tempdb. Basically, I don't care if I lose that table, it's just for my monitoring purposes and in fact, putting it in temp assures me that at some point, it'll get wiped and recreated. If I were blessed to be on a 2014+ platform, I'd look hard at whether a memory optimized table might serve the same need as long as it's declared with a durability of SCHEMA_ONLY. I don't think LOB types are supported there yet, but I suspect it's a matter of time before they are.
This idempotent script will create our table, dbo.ExtendedEventErrorCapture, if it does not exist. It will create an extended event session, what_queries_are_failing, where we capture any error above severity 10 and stream it out to the file on F:\XEventSessions. If you don't have an F drive with that folder, you need to either update the script or talk to your storage vendor as they will happily supply one. Finally, we turn on the trace/session.
USE tempdb; -------------------------------------------------------------------------------- -- Create the table to store out the XE data -------------------------------------------------------------------------------- IF NOT EXISTS ( SELECT * FROM sys.schemas AS S INNER JOIN sys.tables AS T ON T.schema_id = S.schema_id WHERE T.name = 'ExtendedEventErrorCapture' AND S.name = 'dbo' ) BEGIN CREATE TABLE dbo.ExtendedEventErrorCapture ( err_timestamp datetime2(7) NULL , err_severity bigint NULL , err_number bigint NULL , err_message nvarchar(512) NULL , database_id int NULL , sql_text nvarchar(MAX) NULL , event_data xml NULL , session_name sysname , username nvarchar(512) ) -- This is only vaild for a Develper/Enterprise edition license WITH (DATA_COMPRESSION = PAGE); END -------------------------------------------------------------------------------- -- Create the extended event to keep track of bad sql queries -------------------------------------------------------------------------------- IF NOT EXISTS ( SELECT * FROM sys.server_event_sessions AS SES WHERE SES.name = N'what_queries_are_failing' ) BEGIN --Create an extended event session CREATE EVENT SESSION what_queries_are_failing ON SERVER ADD EVENT sqlserver.error_reported ( ACTION (sqlserver.sql_text, sqlserver.tsql_stack, sqlserver.database_id, sqlserver.username) WHERE ([severity]> 10) ) ADD TARGET package0.asynchronous_file_target (set filename = 'F:\XEventSessions\what_queries_are_failing.xel' , metadatafile = 'F:\XEventSessions\what_queries_are_failing.xem', max_file_size = 512, increment = 16, max_rollover_files = 5) WITH (MAX_DISPATCH_LATENCY = 5SECONDS); END -------------------------------------------------------------------------------- -- Turn on the extended event -------------------------------------------------------------------------------- IF NOT EXISTS ( -- When a XE is active, then there is an entry -- in sys.dm_xe_sessions SELECT * FROM sys.dm_xe_sessions AS DXS INNER JOIN sys.server_event_sessions AS SES ON SES.name = DXS.name WHERE SES.name = N'what_queries_are_failing' ) BEGIN -- Start the session ALTER EVENT SESSION what_queries_are_failing ON SERVER STATE = START; END -------------------------------------------------------------------------------- -- Wait for errors -- PROFIT! --------------------------------------------------------------------------------
That's not so bad. Sure, it looks like a lot of code but you've got 1-2 lines of code to change (page compression and where to log to).
Where's the beef?
Once you think you've captured enough data, I turn off the session and parse the files into tables. As I am not a demon robot, I am grateful to Kendra for writing the XQuery for me to parse the meaningful data out of the trace files. I am a developer, but not that kind of developer.
USE tempdb; -------------------------------------------------------------------------------- -- Turn off our extended event -------------------------------------------------------------------------------- IF EXISTS ( -- When a XE is active, then there is an entry -- in sys.dm_xe_sessions SELECT * FROM sys.dm_xe_sessions AS DXS INNER JOIN sys.server_event_sessions AS SES ON SES.name = DXS.name WHERE SES.name = N'what_queries_are_failing' ) BEGIN -- Start the session ALTER EVENT SESSION what_queries_are_failing ON SERVER STATE = STOP; END -------------------------------------------------------------------------------- -- Extract data from our XE -------------------------------------------------------------------------------- ; WITH events_cte AS ( SELECT DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), CURRENT_TIMESTAMP), xevents.event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [err_timestamp], xevents.event_data.value('(event/data[@name="severity"]/value)[1]', 'bigint') AS [err_severity], xevents.event_data.value('(event/data[@name="error_number"]/value)[1]', 'bigint') AS [err_number], xevents.event_data.value('(event/data[@name="message"]/value)[1]', 'nvarchar(512)') AS [err_message], xevents.event_data.value('(event/action[@name="database_id"]/value)[1]', 'int') AS [database_id], xevents.event_data.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(max)') AS [sql_text], xevents.event_data, xevents.event_data.value('(event/action[@name="username"]/value)[1]', 'nvarchar(512)') AS [username], 'what_queries_are_failing' AS session_name FROM sys.fn_xe_file_target_read_file ( 'F:\XEventSessions\what_queries_are_failing*.xel' , 'F:\XEventSessions\what_queries_are_failing*.xem' , NULL , NULL) AS fxe CROSS APPLY (SELECT CAST(event_data as XML) AS event_data) AS xevents ) INSERT INTO dbo.ExtendedEventErrorCapture ( err_timestamp , err_severity , err_number , err_message , database_id , sql_text , event_data , session_name , username ) SELECT E.err_timestamp , E.err_severity , E.err_number , E.err_message , E.database_id , E.sql_text , E.event_data , E.session_name , E.username FROM events_cte AS E;
Mischief managed
Clean up, clean up, everybody clean up! As our final step, since we're good citizens of this database and we remove our extended event and once that's done, we'll leverage xp_cmdshell to delete the tracefiles.
USE tempdb; -------------------------------------------------------------------------------- -- Get rid our extended event -------------------------------------------------------------------------------- IF EXISTS ( SELECT * FROM sys.server_event_sessions AS SES WHERE SES.name = N'what_queries_are_failing' ) BEGIN -- Clean up your session from the server DROP EVENT SESSION what_queries_are_failing ON SERVER; END -------------------------------------------------------------------------------- -- Get rid our extended event files only if the XE is turned off -- or no longer exists -------------------------------------------------------------------------------- IF NOT EXISTS ( SELECT 1 FROM sys.dm_xe_sessions AS DXS INNER JOIN sys.server_event_sessions AS SES ON SES.name = DXS.name WHERE SES.name = N'what_queries_are_failing' UNION ALL SELECT 1 FROM sys.server_event_sessions AS SES WHERE SES.name = N'what_queries_are_failing' ) BEGIN -- Assumes you've turned on xp_cmdshell EXECUTE sys.xp_cmdshell'del F:\XEventSessions\what_queries_are_failing*.xe*'; END
Now what?
Query our table and find out what's going on. I use a query like the following to help me identify what errors we're getting, whether they're transient, etc.SELECT EEEC.err_message , MIN(EEEC.err_timestamp) AS FirstSeen , MAX(EEEC.err_timestamp) AS LastSeen , COUNT_BIG(EEEC.err_timestamp) AS ErrorCount FROM tempdb.dbo.ExtendedEventErrorCapture AS EEEC GROUP BY EEEC.err_message ORDER BY COUNT_BIG(1) DESC;
You've already seen the results but I've very happy to have found a use case for XE in my domain. In the next post, I'll wrap this implementation up with a nice little bow. Or a SQL Agent job. Definitely one of the two.
1 comment:
Love this type of XE. We actually leave an "error trap" XE very similar to this running on all of our instances, all the time. With no noticeable overhead, it has paid huge dividends in troubleshooting current and historical issues.
Post a Comment