A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Tuesday, November 29, 2011

The worst query I ever wrote

I work in an unusual environment---not everything is documented. I know, that sounds too unlikely to be true but it is. There was data in the database that needed to be reported on but not through the supplied UI. In this case, the specifics was the data was stored in Tfs_DefaultCollection and somewhat reportable through Visual Studio Team Explorer but the users wanted queries and reports that weren't supported through the API.

Enter my big mouth saying it's a database I can find a way to make it work

After randomly picking tables and scrolling through values, I needed a smarter approach. Smartness for me meant using the system meta data to find any column that could hold my data (right data type and minimum length). I then build and execute a query looking for an example of the search key in that column. Below is an example of a search for the word change. What I like about this query, is it ought to be fairly portable/database agnostic as I'm using the INFORMATION_SCHEMA views to identify the candidate schemas, tables and columns.


SET NOCOUNT ON;

DECLARE
    -- actual query 
    @query nvarchar(max)
    -- templated query
,   @template nvarchar(max)
    -- string value we wish to find
    -- Length of 50 is arbitrary
,   @searchKey varchar(50);

SET
    @searchKey = 'change';

-- Return column & schema/table combo anywhere
-- there is data in the column that starts like the
-- search key.
SELECT
    @template = N'
    SELECT TOP 1 
        ''<COLUMN_NAME/>'' AS cname
    , ''<SCHEMA/>.<TABLE_NAME/>'' AS tname 
    FROM 
        <SCHEMA/>.<TABLE_NAME/> T 
    WHERE 
        T.<COLUMN_NAME/> LIKE ''<TARGET/>%''';

SELECT
    @template = REPLACE(@template, '<TARGET/>', @searchKey);

DECLARE 
    CSR CURSOR
FOR
-- Iterate through all the columns that are
-- character data types and are at least
-- as long as the search key
SELECT
    ISC.TABLE_SCHEMA
,   ISC.TABLE_NAME
,   ISC.COLUMN_NAME
FROM
    INFORMATION_SCHEMA.COLUMNS ISC
WHERE
    -- filter out tables/views I know I can skip
    ISC.TABLE_NAME not like 'x%'
    AND
    (
        ISC.DATA_TYPE IN ('char', 'nchar', 'varchar', 'nvarchar')
        AND ISC.CHARACTER_MAXIMUM_LENGTH >= LEN(@searchKey)
    );

-- Cursor variables for capturing candidate schemas, tables and columns
DECLARE
    @table_schema sysname
,   @table_name sysname
,   @column_name sysname;

DECLARE
    @RESULTS TABLE
(
    column_name sysname
,   table_schema nvarchar(500) NOT NULL
);

OPEN
    CSR;

FETCH NEXT
FROM
    CSR
INTO
    @table_schema
,   @table_name
,   @column_name;

WHILE (@@FETCH_STATUS = 0)
BEGIN
    -- stub in actual names, make 'em safe via quotename function
    SET @query = REPLACE(@template, '<SCHEMA/>', quotename(@table_schema));
    SET @query = REPLACE(@query, '<TABLE_NAME/>', quotename(@table_name));
    SET @query = REPLACE(@query, '<COLUMN_NAME/>', quotename(@column_name));

    BEGIN TRY
        --PRINT @query
        
        -- Dump results into a table variable
        INSERT INTO
            @RESULTS
        EXECUTE(@query);
        
    END TRY
    BEGIN CATCH
        -- print failing query
        PRINT @query;
        
    END CATCH
    FETCH NEXT
    FROM
        CSR
    INTO
        @table_schema
    ,   @table_name
    ,   @column_name;
END
CLOSE CSR;
DEALLOCATE CSR;

-- Show all the columns and fully qualified tables
-- that contained the value
SELECT 
    R.column_name
,   R.table_schema
,   'SELECT T.* FROM ' + R.table_schema + ' T WHERE T.' + R.column_name + ' LIKE  ''' + @searchKey + '%''' AS explore_query
FROM 
    @results R;

I can run the above against Tfs_DefaultCollection.

column_name table_schema explore_query
[ChildItem] [dbo].[tbl_Version] SELECT T.* FROM [dbo].[tbl_Version] T WHERE T.[ChildItem] LIKE  'change%'
[ProjectName] [dbo].[tbl_Project] SELECT T.* FROM [dbo].[tbl_Project] T WHERE T.[ProjectName] LIKE  'change%'
[ChildItem] [dbo].[tbl_VersionedItem] SELECT T.* FROM [dbo].[tbl_VersionedItem] T WHERE T.[ChildItem] LIKE  'change%'
And easily switch over to Tfs_Warehouse to try and track those values over there
column_name table_schema explore_query
[ChangesetTitle] [dbo].[DimChangeset] SELECT T.* FROM [dbo].[DimChangeset] T WHERE T.[ChangesetTitle] LIKE  'change%'
[FileName] [dbo].[DimFile] SELECT T.* FROM [dbo].[DimFile] T WHERE T.[FileName] LIKE  'change%'
[ProjectNodeName] [dbo].[DimTeamProject] SELECT T.* FROM [dbo].[DimTeamProject] T WHERE T.[ProjectNodeName] LIKE  'change%'

Warnings, caveats, etc

While I'd think it's blindingly obvious, never run code from the interblag. Beyond that, this code is not going to be fast by any stretch of the imagination. It could lock a table for the read. It might bog down your disk infrastructure, especially if you're querying against a very large table. And it's going to do it over and over again so buyer beware.

No comments: