World of Whatever

A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Wednesday, February 27, 2019

Biml in Azure aka CallBimlScriptContent

CallBimlScriptContent was introduced with the migration from Mist to BimlStudio. Why is this cool? You do not have to use files sitting on your computer as the source for your Biml. As long as you can reconstitute the Biml contents into a string, you can store your scripts where ever you'd like. If you want them in a database, that's great. Store them in the cloud? Knock yourself out.

As a consultant, the latter is rather compelling. Maybe I'm only licensing my clients to use accelerators during our engagement. If I leave files on the file system after I roll off, or they image my computer and accidentally collect them, I am David fighting Goliath. CallBimlScriptContent is a means to protect myself and my IP. Let's look at a trivial example. I set a C# string with an empty Package tag (hooray for doubling up my double quotes). Within my Packages collection, I invoke CallBimlScriptContent passing in my Biml content.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#
// Do something here to populate myBimlFile
string myBimlFile = @"<Package Name=""ABC"" />";
#>    
    <Packages>
        <#=CallBimlScriptContent(myBimlFile)#>
    </Packages>
</Biml>
The rendered Biml for the above would look like
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Packages>
    <Package Name="ABC" />
  </Packages>
</Biml>

It's super that it works, but that's not convenient. Like, at all! Plus, good luck trying to embed any complexity in that string.

So, let's try something a little more complex. Conceptually, imagine we have two Biml Scripts we might choose to call inc_Package_00.biml and inc_Package_10.biml <#@ property name="parameterName" type="string" #>

inc_Package_00.biml

<Package Name="ABC" />

inc_Package_10.biml

<#@ property name="packageName" type="string" #>
<Package Name="packageName" />
Our original code could then look like
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#
// Do something here to populate myBimlFile
string myBimlFile =System.IO.File.ReadAllText(@"C:\tmp\inc_Package_00.biml");
#>    
    <Packages>
        <#=CallBimlScriptContent(myBimlFile, "Package_00)"#>
    </Packages>
</Biml>
Do you need to pass parameters? It's no different than what you're used to doing
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#
// Do something here to populate myBimlFile
string myBimlFile =System.IO.File.ReadAllText(@"C:\tmp\inc_Package_10.biml");
#>    
    <Packages>
        <#=CallBimlScriptContent(myBimlFile, "Package_10)"#>
    </Packages>
</Biml>

In the next post, I'll show you how to use reference data stored in tables or Azure as your BimlScript Content. Stay tuned!

Thursday, December 20, 2018

My github repository

In preparation for my talk at the Kansas City SQL Server User Group this afternoon, I am putting this post here so people can get the materials easily.

Lightning Talks

Friday, October 26, 2018

SQL Server Agent Job Sort Order

SQL Server Agent Job Sort Order

Today's post could also be titled "I have no idea what is happening here." We have an agent job, "Job - Do Stuff". We then created a few hundred jobs (templates for the win) all named like "Job - Do XYZ" where XYZ is a mainframe module identifier. When I'm scrolling through the list of jobs, it takes a few passes for my eye to find Do Stuff between DASD and DURR. I didn't want to change the leading portion of my job but I wanted my job to be sorted first. I open an ASCII table and find a useful character that sorts before the dash. Ah, asterisk, ASCII 42 comes before dash, ASCII 45.

Well, that was unexpected. In my reproduction here, the job names will take the form of the literal string "JOB " (trailing space there). I then use a single ASCII character as separator. A use another string literal "CHAR(" and then I display the ASCII ordinal value and for completeness, I close the parenthesis. Thus, JOB * CHAR(42) and JOB - CHAR(45). Assuming I sort ascending alphabetically, which under the sheets I would convert each character to its ASCII value, would lead to me JOB * CHAR(42) on top.

That ain't the way it's being sorted in SSMS though. Let's figure out "is this an application issue or a database issue?" Jobs are stored in the database msdb in a table called sysjobs in the dbo schema. Let's start there.

SELECT
    S.name AS JobName
FROM
    msdb.dbo.sysjobs AS S
WHERE
    S.name LIKE 'JOB%'
ORDER BY
    S.name;

Huh.

Ok, so what goes into sorting? Collations


SELECT
    S.name AS SchemaName
,   T.name AS TableName
,   C.name AS ColumnName
,   T2.name AS DataTypeName
,   C.collation_name AS ColumnCollationName
,   T2.collation_name AS TypeCollationName
FROM
    msdb.sys.schemas AS S
    INNER JOIN
        msdb.sys.tables AS T
        ON T.schema_id = S.schema_id
    INNER JOIN
        msdb.sys.columns AS C
        ON C.object_id = T.object_id
    INNER JOIN
        msdb.sys.types AS T2
        ON T2.user_type_id = C.user_type_id
WHERE
    S.name = 'dbo'
    AND T.name = 'sysjobs'
    AND C.name = 'name';

The name column for dbo.sysjobs is of data type sysname which uses the collation of "SQL_Latin1_General_CP1_CI_AS". If it's the collation causing the "weird" sort, then we should be able to reproduce it, right?

SELECT *
FROM
(
    VALUES
        ('JOB - CHAR(45)' COLLATE SQL_Latin1_General_CP1_CI_AS)
    ,   ('JOB * CHAR(42)' COLLATE SQL_Latin1_General_CP1_CI_AS)
) D(jobName)
ORDER BY
    D.jobName COLLATE SQL_Latin1_General_CP1_CI_AS;

Nope, not the collation since this returns in the expected sort order.

At this point, I waste a lot time going down rabbit holes that this isn't, because in my reproduction was not verbatim. I neglected to preface my strings with an N thus leaving them as ascii strings, not unicode strings.

SELECT *
FROM
(
    VALUES
        (N'JOB - CHAR(45)' COLLATE SQL_Latin1_General_CP1_CI_AS)
    ,   (N'JOB * CHAR(42)' COLLATE SQL_Latin1_General_CP1_CI_AS)
) D(jobName)
ORDER BY
    D.jobName COLLATE SQL_Latin1_General_CP1_CI_AS;

Running that, we get the same sort from sysjobs. At this point, I remember something about unicode sorting being different than old school dictionary sort like I was expecting. And after finding this answer on collations I'm happy simply setting my quest aside and stepping away from the keyboard.

Oh, but if you want to see what the glorious sort order is for characters in the printable range (32 to 127), my script is below. Technically, 127 is a cheat since it's the DELETE but I include it because of where it sorts.

Make the jobs

This script has two templates in it - @MischiefManaged deletes a job and @Template creates a job. I query against sys.all_columns to get a sequential set of numbers from 1 to (127 -32). I use that number and string concatenation (requires 2012+) plus the CHAR function to translate the number into the corresponding ASCII character. It will print out "JOB ' CHAR(39)" once complete because I'm lazy.

DECLARE
    @Template nvarchar(max) = N'
use msdb;
IF EXISTS (SELECT * FROM dbo.sysjobs AS S WHERE S.name = ''<JobName/>'')
BEGIN
    EXECUTE dbo.sp_delete_job @job_name = ''<JobName/>'';
END
EXECUTE dbo.sp_add_job
    @job_name = N''<JobName/>''
,   @enabled = 1
,   @notify_level_eventlog = 0
,   @notify_level_email = 2
,   @notify_level_page = 2
,   @delete_level = 0
,   @category_name = N''[Uncategorized (Local)]'';

EXECUTE dbo.sp_add_jobserver
    @job_name = N''<JobName/>''
,   @server_name = @@SERVERNAME;

EXEC dbo.sp_add_jobstep
    @job_name = N''<JobName/>''
,   @step_name = N''MinimumViableJob''
,   @step_id = 1
,   @cmdexec_success_code = 0
,   @on_success_action = 2
,   @on_fail_action = 2
,   @retry_attempts = 0
,   @retry_interval = 0
,   @os_run_priority = 0
,   @subsystem = N''TSQL''
,   @command = N''SELECT 1''
,   @database_name = N''msdb''
,   @flags = 0;

EXEC dbo.sp_update_job
    @job_name = N''<JobName/>''
,   @start_step_id = 1;
'
,   @MischiefManaged nvarchar(4000) = N'
use msdb;
IF EXISTS (SELECT * FROM dbo.sysjobs AS S WHERE S.name = ''<JobName/>'')
BEGIN
    EXECUTE dbo.sp_delete_job @job_name = ''<JobName/>'';
END'
,   @Token sysname = '<JobName/>'
,   @JobName sysname
,   @Query nvarchar(max);

DECLARE
    CSR CURSOR
FAST_FORWARD
FOR
SELECT
    J.jobName
FROM
(
    SELECT TOP (127-31)
        31 + (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) AS rn
    FROM sys.all_columns AS AC
) D(rn)
    CROSS APPLY
    (
        SELECT
            CONCAT('JOB ', CHAR(D.rn), ' CHAR(', D.rn, ')')
    )J(jobName)

OPEN CSR;
FETCH NEXT FROM CSR INTO @JobName;

WHILE @@FETCH_STATUS = 0
BEGIN
    BEGIN TRY  
        SET @Query = REPLACE(@Template, @Token, @JobName);
        ---- Uncomment the following to clean up our jobs
        --SET @Query = REPLACE(@MischiefManaged, @Token, @JobName);
        EXECUTE sys.sp_executesql @Query, N'';
    END TRY
    BEGIN CATCH
        PRINT @JobName;
    END CATCH
    FETCH NEXT FROM CSR INTO @JobName;
END
CLOSE CSR;
DEALLOCATE CSR;

At this point, you can refresh the Jobs list in SSMS and the result is this job sort.

Once you're satisfied with how things look, uncomment this line SET @Query = REPLACE(@MischiefManaged, @Token, @JobName); and rerun the script. All will be cleaned up.

Let's just chalk sorting up there with timezones, ok? Sounds easy but isn't. If you know more than me, please explain away in the comments section and share your knowledge.

Thursday, October 18, 2018

Polling in SQL Agent

Polling in SQL Agent

A fun question over on StackOverflow asked about using SQL Agent with SSIS to poll for a file's existence. As the comments indicate, there's a non-zero startup time associated with SSIS (it must validate the metadata associated to the sources and destinations), but there is a faster, lighter weight alternative. Putting together a host of TSQL ingredients, including undocumented extended stored procedures, the following recipe could be used as a SQL Agent job step.

If you copy and paste the following query into your favorite instance of SQL Server, it will execute for one minute and it will complete by printing the words "Naughty, naughty".

SET NOCOUNT ON;
-- http://www.patrickkeisler.com/2012/11/how-to-use-xpdirtree-to-list-all-files.html
DECLARE
    -- Don't do stupid things like adding spaces into folder names
    @sourceFolder varchar(260) = 'C:\ssisdata\Input'
    -- Have to use SQL matching rules, not DOS/SSIS
,   @fileMask sysname = 'SourceData%.txt'
    -- how long to wait between polling
,   @SleepInSeconds int = 5
    -- Don't exceed 24 hours aka 86400 seconds
,   @MaxTimerDurationInSeconds int = (3600 * 0) + (60 * 1) + 0
    -- parameter for xp_dirtree 0 => top folder only; 1 => subfolders
,   @depth int = 1
    -- parameter for xp_dirtree 0 => directory only; 1 => directory and files
,   @collectFile int = 1
,   @RC bigint = 0;

-- Create a table variable to capture the results of our directory command
DECLARE
    @DirectoryTree table
(
    id int IDENTITY(1, 1)
,   subdirectory nvarchar(512)
,   depth int
,   isFile bit
);

-- Use our sleep in seconds time to generate a delay time string
DECLARE
    @delayTime char(10) = CONVERT(char(10), TIMEFROMPARTS(@SleepInSeconds/60 /60, @SleepInSeconds/60, @SleepInSeconds%60, 0, 0), 108)
,   @stopDateTime datetime2(0) = DATEADD(SECOND, @MaxTimerDurationInSeconds, CURRENT_TIMESTAMP);

-- Force creation of the folder
EXECUTE dbo.xp_create_subdir @sourceFolder;

-- Load the results of our directory
INSERT INTO
    @DirectoryTree
(
    subdirectory
,   depth
,   isFile
)
EXECUTE dbo.xp_dirtree
    @sourceFolder
,   @depth
,   @collectFile;

-- Prime the pump
SELECT
    @RC = COUNT_BIG(1)
FROM
    @DirectoryTree AS DT
WHERE
    DT.isFile = 1
    AND DT.subdirectory LIKE @fileMask;

WHILE @rc = 0 AND @stopDateTime > CURRENT_TIMESTAMP
BEGIN

    -- Load the results of our directory
    INSERT INTO
        @DirectoryTree
    (
        subdirectory
    ,   depth
    ,   isFile
    )
    EXECUTE dbo.xp_dirtree
        @sourceFolder
    ,   @depth
    ,   @collectFile;

    -- Test for file existence
    SELECT
        @RC = COUNT_BIG(1)
    FROM
        @DirectoryTree AS DT
    WHERE
        DT.isFile = 1
        AND DT.subdirectory LIKE @fileMask;

    IF @RC = 0
    BEGIN
        -- Put our process to sleep for a period of time
        WAITFOR DELAY @delayTime;
    END
END

-- at this point, we have either exited due to file found or time expired
IF @RC > 0
BEGIN
    -- Take action when file was found
    PRINT 'Go run SSIS or something';
END
ELSE
BEGIN
    -- Take action for file not delivered in expected timeframe
    PRINT 'Naughty, naughty';
END

If you rerun the above query, in a separate window, assuming you have xp_cmdshell enabled, firing the following query will create a file with the expected pattern. Instead, it'll print out "Go run SSIS or something"

DECLARE
    @sourceFolder varchar(260) = 'C:\ssisdata\Input'
,   @fileMask sysname = REPLACE('SourceData%.txt', '%', CONVERT(char(10), CURRENT_TIMESTAMP, 120))
DECLARE
    @command varchar(1000) = 'echo > ' + @sourceFolder + '\' + @fileMask;

-- If you get this error
--Msg 15281, Level 16, State 1, Procedure sys.xp_cmdshell, Line 1 [Batch Start Line 0]
--SQL Server blocked access to procedure 'sys.xp_cmdshell' of component 'xp_cmdshell' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'xp_cmdshell' by using sp_configure. For more information about enabling 'xp_cmdshell', search for 'xp_cmdshell' in SQL Server Books Online.
--
-- Run this
--EXECUTE sys.sp_configure'xp_cmdshell', 1;
--GO
--RECONFIGURE;
--GO
EXECUTE sys.xp_cmdshell @command;

Once you're satisfied with how that works, now what? I'd likely set up a step 2 which is the actual running of the SSIS package (instead of printing a message). What about the condition that a file wasn't found? I'd likely use throw/raiserrror or just old fashioned divide by zero to force the first job step to fail. And then specify a reasonable number of @retry_attempts and @retry_interval.

Tuesday, October 9, 2018

Biml Excel Data Source without Excel

Biml Excel Meta Data Source without Excel

In the previous post, Reading Excel files without Excel, I showed some simple code to consume Excel without having Excel installed on your machine. How/Why would I use this - well look at the sample spreadsheet. That could be used quite nicely by a business analyst to generate SSIS packages. In fact, it is being used by a very savvy business analyst at one of my clients' shadow IT groups to identify the source data they'd like brought into their data mart. They are translating their mainframe data extracts into SQL equivalents and specifying where the data should land.

This is exciting for me as this team gets their data and knows the business problems they need to solve &emdash; they just didn't have all the tools to do so. They are supplying the data domain expertise and we are generating consistent packages that adhere to corporate standards (as well as defining the scheduling, alerting, etc). It's a good match.

My resources are quite simple: Excel Spreadsheet containing meta data, a driver program and a package template.

The template is your standard truncate and reload pattern with the target table being specified by a parameter. The client validates data by running processes in parallel so the existing mainframe process delivers data to the Billing table while ours delivers to a Billing_NEW table. Once they accept the new process, the target table becomes Billing and the NEW table is dropped. I decided the most native SSIS route would be use specify the target table in as a parameter. We originally have a boolean parameter indicating whether we were loading the new table or the production one but that was more logic and overhead that just specifying which table to load. I force their queries to be dirty reads as some of these queries can be rather messy.

<#@ template designerbimlpath="/Biml/Packages" #>
<#@ property name="schemaName" type="string" #>
<#@ property name="tableName" type="string" #>
<#@ property name="parameterName" type="string" #>
<#@ property name="sourceQuery" type="string" #>
<#@ property name="sourceConnectionName" type="string" #>
<#@ property name="targetConnectionName" type="string" #>
<#@ property name="businessFriendlyName" type="string" #>
<#
string packageName = string.Format("{0}_Load_{1}{2}", targetConnectionName.ToUpper(), businessFriendlyName, "");
CustomOutput.PackageName = packageName;
#>
        <Package Name="&lt;#= packageName #>" ConstraintMode="Linear">
            <Parameters>
                <Parameter Name="TargetTableName" DataType="String"><#= tableName #></Parameter>
            </Parameters>
            <Variables>
                <Variable Name="SchemaName" DataType="String"><#= schemaName#></Variable>
                <Variable Name="TableName" DataType="String" EvaluateAsExpression="true"><#= parameterName #></Variable>
                <Variable Name="QualifiedTableName" DataType="String" EvaluateAsExpression="true">&quot;[&quot; +   @[User::SchemaName] + &quot;].[&quot; + @[User::TableName]+ &quot;]&quot;</Variable>
                <Variable Name="QueryTruncate" DataType="String" EvaluateAsExpression="true">"TRUNCATE TABLE " + @[User::QualifiedTableName] + ";"</Variable>
            </Variables>
            <Tasks>
                  <ExecuteSQL Name="SQL Truncate Target" ConnectionName="&lt;#= targetConnectionName #>">
                    <VariableInput VariableName="User.QueryTruncate" />
                </ExecuteSQL>
                <Dataflow Name="DFT Load &lt;#= businessFriendlyName #>">
                    <Transformations>
                        <OleDbSource ConnectionName="&lt;#= sourceConnectionName #>" Name="OLESRC Query ">
                            <DirectInput><![CDATA[SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
<#= sourceQuery#>
]]>
                            </DirectInput>
                        </OleDbSource>
                        <OleDbDestination Name="OLEDST &lt;#= schemaName #>_<#= tableName#>" ConnectionName="<#= targetConnectionName #>">
                            <TableFromVariableOutput VariableName="User.QualifiedTableName" />
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>

My ProjectDriver.biml file is fairly straight forward. In line 1 I provide a relative path to my EPPlus.dll The ..\ indicates I would find the assembly one folder up - two folders actually since I have a single copy in my base Visual Studio folder. Line 2 specifies we need to bring in OfficeOpenXml library. In Line 5 I create a variable that will hold the metadata for my solution. Line 6 is kind of interesting. I let the template determine what the package name should be based on the supplied meta data. Rather than having to perform that logic twice, it'd be nice to keep track of what packages have been created. Not only nice, it'll be required since we're using the Project Deployment Model! Line 19 is where we actually stamp out a specific package and look at that second parameter out customOutput That is the mechanism for our template to send information back to the caller. In our case, we'll add the package name to our ever growing list of packages. In line 28, we then run back through our list of packages and build out the project's definition. And that's about it. We've already talked about the GetExcelDriverData method. The GetDriverData method provides a simple abstraction between where I actually get metadata and how the packages are built. You can see a commented out reference to a GetStaticDriverData method which I used during development to test boundary conditions. Who knows, maybe I will pull from Azure Tables next...

<#@ assembly name= "..\..\EPPlus.dll" #>
<#@ import namespace="OfficeOpenXml" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#
    Dictionary<string, List&lt;string>> dasData = new Dictionary<string, List&lt;string>>();
    List<string> packageList = new List<string>();
    string templateName = "inc_TruncAndReloadPackageParameter.biml";
    string projectName = "SHADOW_IT_DataSnapshot";

    // Get our meta data    
    dasData = GetDriverData();
#>
    <Packages>
<#

    dynamic customOutput;
    foreach(var key in dasData.Keys)
    {
        WriteLine(CallBimlScriptWithOutput(templateName, out customOutput, dasData[key][0], dasData[key][1], dasData[key][2], dasData[key][3], dasData[key][4], dasData[key][5], dasData[key][6]));
        packageList.Add(customOutput.PackageName);
    }
#>
    </Packages>
    <Projects>
        <PackageProject Name="&lt;#= projectName #>">
            <Packages>
<#
        foreach(var key in packageList)
        {
#>
            <Package PackageName="&lt;#= key #>" />
<#
        }
#>            
            </Packages>
            <Connections>
                <Connection ConnectionName="WWI_DB" />
                <Connection ConnectionName="WWI_DW" />
            </Connections>
        </PackageProject>
    </Projects>
    <Connections>
        <OleDbConnection Name="WWI_DB" ConnectionString="Data Source=.\DEV2017;Initial Catalog=WWI_DW;Provider=SQLNCLI11.1;Integrated Security=SSPI;Application Intent=READONLY;ConnectionTimeout=0;" CreateInProject="true" />
        <OleDbConnection Name="WWI_DW" ConnectionString="Data Source=.\DEV2017;Initial Catalog=WWI_DB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Application Intent=READONLY;" CreateInProject="true" />
    </Connections>
</Biml>

<#+

    /// Get data from Excel worksheet
    public Dictionary<string, List<string>> GetExcelDriverData(string sourceFile)
    {
        Dictionary<string, List<string>> d = new Dictionary<string, List<string>>();
        System.IO.FileInfo fi = new System.IO.FileInfo(sourceFile);
        using (ExcelPackage ep = new ExcelPackage(fi))
        {
            ExcelWorkbook wb = ep.Workbook;
            ExcelWorksheet ws = wb.Worksheets.First();
            if (ws != null)
            {
                // 1 based array to 7, inclusive
                for (int i = ws.Dimension.Start.Row+1; i < ws.Dimension.End.Row+1; i++)
                {
                    List<string> row = new List<string>() { ws.Cells[i, 1].Value.ToString()
                    ,   ws.Cells[i, 2].Value.ToString()
                    ,   ws.Cells[i, 3].Value.ToString()
                    ,   ws.Cells[i, 4].Value.ToString()
                    ,   ws.Cells[i, 5].Value.ToString()
                    ,   ws.Cells[i, 6].Value.ToString()
                    ,   ws.Cells[i, 7].Value.ToString()
                    };
                    
                    d[ws.Cells[i, 7].Value.ToString()] = row;
                }
            }
        }
        
        return d;
    }

    public Dictionary<string, List<string>> GetDriverData()
    {
        string sourceFile= @"C:\Users\billinkc\Documents\ShadowIt_DataSnap.xlsx";
        return GetExcelDriverData(sourceFile);
        //return GetStaticDriverData();
    }
#>

And that's how we can use EPPlus to consume metadata stored in Excel to generate many packages with Biml. Let me know if this helps or if you have questions about how to get this running. It's good stuff, I can't get enough of it.

Monday, October 8, 2018

Reading Excel files without Excel

Reading Excel files without Excel

A common problem working with Excel data is Excel itself. Working with it programatically requires an installation of Office, and the resulting license cost, and once everything is set, you're still working with COM objects which present its own set of challenges. If only there was a better way.

Enter, the better way - EPPlus. This is an open source library that wraps the OpenXml library which allows you to simply reference a DLL. No more installation hassles, no more licensing (LGPL) expense, just a simple reference you can package with your solutions.

Let's look at an example. Here's a simple spreadsheet with a header row and a row's worth of data.

For each row, after the header, I'll read the 7 columns into a list and then, since I assume the last column, BusinessFriendlyName, is unique, I'll use that as the key for my return dictionary.

using OfficeOpenXml;
...
    /// Get data from Excel worksheet
    public Dictionary<string, List<string>> GetExcelDriverData(string sourceFile)
    {
        Dictionary<string, List<string>> d = new Dictionary<string, List<string>>();
        System.IO.FileInfo fi = new System.IO.FileInfo(sourceFile);
        using (ExcelPackage ep = new ExcelPackage(fi))
        {
            ExcelWorkbook wb = ep.Workbook;
            ExcelWorksheet ws = wb.Worksheets.First();
            if (ws != null)
            {
                // 1 based array to 7, inclusive
                for (int i = ws.Dimension.Start.Row+1; i < ws.Dimension.End.Row+1; i++)
                {
                    List<string> row = new List<string>() { ws.Cells[i, 1].Value.ToString()
                    ,   ws.Cells[i, 2].Value.ToString()
                    ,   ws.Cells[i, 3].Value.ToString()
                    ,   ws.Cells[i, 4].Value.ToString()
                    ,   ws.Cells[i, 5].Value.ToString()
                    ,   ws.Cells[i, 6].Value.ToString()
                    ,   ws.Cells[i, 7].Value.ToString()
                    };
                    
                    d[ws.Cells[i, 7].Value.ToString()] = row;
                }
            }
        }
        
        return d;
    }

It's as easy as that. There are plenty of more clever implementations out there but I wanted to demonstrate a quick and easy method to read Excel from your .NET code.

Tuesday, August 14, 2018

A date dimension for SQL Server

A date dimension for SQL Server

The most common table you will find in a data warehouse will be the date dimension. There is no "right" implementation beyond what the customer needs to solve their business problem. I'm posting a date dimension for SQL Server that I generally find useful as a starting point in the hopes that I quit losing it. Perhaps you'll find it useful or can use the approach to build one more tailored to your environment.

As the comments indicate, this will create: a DW schema, a table named DimDate and then populate the date dimension from 1900-01-01 to 2079-06-06 endpoints inclusive. I also patch in 9999-12-31 as a well known "unknown" date value. Sure, it's odd to have an incomplete year - this is your opportunity to tune the supplied code ;)

-- At the conclusion of this script, there will be
-- A schema named DW
-- A table named DW.DimDate
-- DW.DimDate will be populated with all the days between 1900-01-01 and 2079-06-06 (inclusive)
--   and the sentinel date of 9999-12-31

IF NOT EXISTS
(
    SELECT * FROM sys.schemas AS S WHERE S.name = 'DW'
)
BEGIN
    EXECUTE('CREATE SCHEMA DW AUTHORIZATION dbo;');
END
GO
IF NOT EXISTS
(
    SELECT * FROM sys.schemas AS S INNER JOIN sys.tables AS T ON T.schema_id = S.schema_id
    WHERE S.name = 'DW' AND T.name = 'DimDate'
)
BEGIN
    CREATE TABLE DW.DimDate
    (
        DateSK int NOT NULL
    ,   FullDate date NOT NULL
    ,   CalendarYear int NOT NULL
    ,   CalendarYearText char(4) NOT NULL
    ,   CalendarMonth int NOT NULL
    ,   CalendarMonthText varchar(12) NOT NULL
    ,   CalendarDay int NOT NULL
    ,   CalendarDayText char(2) NOT NULL
    ,   CONSTRAINT PK_DW_DimDate
            PRIMARY KEY CLUSTERED
            (
                DateSK ASC
            )
            WITH (ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE)
    ,   CONSTRAINT UQ_DW_DimDate UNIQUE (FullDate)
    );
END
GO
WITH 
    -- Define the start and the terminal value
    BOOKENDS(FirstDate, LastDate) AS (SELECT DATEFROMPARTS(1900,1,1), DATEFROMPARTS(9999,12,31))
    -- itzik ben gan rapid number generator
    -- Builds 65537 rows. Need more - follow the pattern
    --  Need fewer rows, add a top below
,    T0 AS 
(
    -- 2
    SELECT 1 AS n
    UNION ALL SELECT 1
)
,    T1 AS
(
    -- 2^2 => 4 
    SELECT 1 AS n
    FROM
        T0
        CROSS APPLY T0 AS TX
)
,    T2 AS 
(
    -- 4^4 => 16
    SELECT 1 AS n
    FROM
        T1
        CROSS APPLY T1 AS TX
)
,    T3 AS 
(
    -- 16^16 => 256
    SELECT 1 AS n
    FROM
        T2
        CROSS APPLY T2 AS TX
)
,    T4 AS
(
    -- 256^256 => 65536
    -- or approx 179 years
    SELECT 1 AS n
    FROM
        T3
        CROSS APPLY T3 AS TX
)
,    T5 AS
(
    -- 65536^65536 => basically infinity
    SELECT 1 AS n
    FROM
        T4
        CROSS APPLY T4 AS TX
)
    -- Assume we now have enough numbers for our purpose
,    NUMBERS AS
(
    -- Add a SELECT TOP (N) here if you need fewer rows
    SELECT
        CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS int) -1 AS number
    FROM
        T4
    UNION 
    -- Build End of time date
    -- Get an N value of 2958463 for
    -- 9999-12-31 assuming start date of 1900-01-01
    SELECT
        ABS(DATEDIFF(DAY, BE.LastDate, BE.FirstDate))
    FROM
        BOOKENDS AS BE
)
, DATES AS
(
SELECT
    PARTS.DateSk
,   FD.FullDate
,   PARTS.CalendarYear
,   PARTS.CalendarYearText
,   PARTS.CalendarMonth
,   PARTS.CalendarMonthText
,   PARTS.CalendarDay
,   PARTS.CalendarDayText
FROM
    NUMBERS AS N
    CROSS APPLY
    (
        SELECT
            DATEADD(DAY, N.number, BE.FirstDate) AS FullDate
        FROM
            BOOKENDS AS BE
    )FD
    CROSS APPLY
    (
        SELECT
            CAST(CONVERT(char(8), FD.FullDate, 112) AS int) AS DateSk
        ,   DATEPART(YEAR, FD.FullDate) AS [CalendarYear] 
        ,   DATENAME(YEAR, FD.FullDate) AS [CalendarYearText]
        ,   DATEPART(MONTH, FD.FullDate) AS [CalendarMonth]
        ,   DATENAME(MONTH, FD.FullDate) AS [CalendarMonthText]
        ,   DATEPART(DAY, FD.FullDate)  AS [CalendarDay]
        ,   DATENAME(DAY, FD.FullDate) AS [CalendarDayText]

    )PARTS
)
INSERT INTO
    DW.DimDate
(
    DateSK
,   FullDate
,   CalendarYear
,   CalendarYearText
,   CalendarMonth
,   CalendarMonthText
,   CalendarDay
,   CalendarDayText
)
SELECT
    D.DateSk
,   D.FullDate
,   D.CalendarYear
,   D.CalendarYearText
,   D.CalendarMonth
,   D.CalendarMonthText
,   D.CalendarDay
,   D.CalendarDayText
FROM
    DATES AS D
WHERE NOT EXISTS
(
    SELECT * FROM DW.DimDate AS DD
    WHERE DD.DateSK = D.DateSk
);