World of Whatever

A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Loading...

Wednesday, July 20, 2016

Biml Transformer Update a variable value

Biml

In preparation for Biml Hero training, I thought it would be a good idea to understand Biml Transformers. I have read that article many times but never really dug in to try and understand it, much less find a situation where it'd be something I needed. This post covers my first attempt to using one.

Use case: updating a variable value

Assume you have an SSIS variable in your Biml files that you need to update the value - the server died and you need a new server name patched in. You could do a search and replace in your biml, or apply a configuration once you emit and deploy the SSIS package but let's try the transformer.

Source biml

Let's use the following simple biml as our package. It's three sequence containers in a serial connection and a single variable ServerName with a value of SQLDEV.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="OriginalPackage_LocalMerge" ConstraintMode="Linear">
            <Variables>
                <Variable DataType="String" Name="ServerName">SQLDEV</Variable>
            </Variables>
            <Tasks>
                <Container Name="SEQ 1" />
                <Container Name="SEQ 2" />
                <Container Name="SEQ 3" />
            </Tasks>
        </Package>
    </Packages>
</Biml>

Transformer biml

Our transformer is simple. We've specified "LocalMerge" as we only want to fix the one thing. That one thing, is an SSIS Variable named "ServerName".

What is going to happen is that we will redeclare the variable, this time we will specify a value of "SQLQA" for our Value. Additionally, I'm going to add a Description to my Variable and preserve the original value. The TargetNode object has a lot of power to it as we'll see over this series of posts on Biml Transformers.

<#@ target type="Variable" mergemode="LocalMerge" #>
<#
    string variableName = "ServerName";
    if (TargetNode.Name == variableName)
    {
#>
    <Variable DataType="String" Name="ServerName">SQLQA<Annotations><Annotation AnnotationType="Description">Value was <#=  TargetNode.Value#></Annotation></Annotations></Variable>
<#
    }
#>

So what's happening?

I think it's important to understand how this stuff works, otherwise you might not get the results you expect. The Biml compiler is going to take our source biml. If I had code nuggets in there, those get expanded and once all the "normal" biml is complete, then our transformers swoop in and are executed. This allows the transformers to work with the finalized objects, metadata is set and all that, prior to rendering actual SSIS packages.

Result

It doesn't look that amazing, I admit.

<Variable DataType="String" Name="ServerName">SQLQA
    <Annotations>
        <Annotation AnnotationType="Description">Value was SQLUAT</Annotation>
    </Annotations>
</Variable>
But conceptually, wouldn't it be handy to be able to selectively modify bits of a package? Someone didn't name their Tasks or Components well? You could have a transformer fix that. Someone forgot to add your logging/auditing/framework code? You could have a transformer fix that too!

Start thinking in Biml! Most of my answers on StackOverflow have biml in them simply because it makes describing ETL so much easier.

Caveats

Transformers require a license for Mist. They don't work in BIDS Helper, BimlExpress or BimlOnlien.

Friday, July 15, 2016

Building SSIS packages using the Biml object model

Programmatically building SSIS packages via the Biml Object Model

I thought it might be fun to try and figure out how to use the Biml Api to construct SSIS packages. This post is the first in the occasional series as I explore and find neat new things.

Getting Started

The most important precursor to doing this is you will need a licensed installation of Mist. Full stop. The assemblies we're going to use have security built into them to tell whether they are licensed and you cannot use the assemblies shipped with BidsHelper or BimlExpress as they've hardwired to the specific apps.

We're going to use two classes: AstRootNode and AstPackageNode.

Ast, what is that? Abstract Syntax Tree - it's a compiler theory thing.

AstRootNode? The root node is the <Biml /> tag. It contains all the collections in your biml declaration.

AstPackageNode? This is an instance of an SSIS package.

using Varigence.Languages.Biml;
using Varigence.Languages.Biml.Task;
using Varigence.Languages.Biml.Transformation;

...

            AstRootNode arn = new AstRootNode(null);
            AstPackageNode pkg = new AstPackageNode(arn);
            arn.Packages.Add(pkg);

            pkg.Name = "HelloWorld";

Now what?
You have two choices, you can get the Biml

Console.WriteLine(arn.GetBiml());

which results in

<Biml >
    <Packages>
        <Package Name="HelloWorld" />
    </Packages>
</Biml>

Or you can get the Xml

Console.WriteLine(arn.EmitAllXml());

which looks like

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="HelloWorld" />
    </Packages>
</Biml>

In this example, they're nearly identical except the Xml emission results in the namespace existing in the Biml declaration while the GetBiml call just returns Biml. Interestingly enough, if either is fed through BimlExpress, they'll both pass validation.

Wednesday, May 25, 2016

SSIS - What is the name of the file

What is the name of the current file

It's common to not know the exact name of the file you're loading. You often have to apply some logic - a file mask, date calculation logic, etc to determine what file needs to be loaded. In SSIS, we often use a ForEach File Enumerator to accomplish this but an Expression task or even an Execute SQL Task can be used to retrieve/set a file name. Today, I'm going to show you two different mechanisms for identifying the current file name.

Did you know that in SSIS, the Flat File Source exposes an Advanced property called FileNameColumnName. This shows in your Properties window for the Flat File Connection Manager

There are two different click paths for setting the FileNameColumnName property. The first is to right click on the Flat File Source and select the "Show Advanced Editor" option. There, navigate to Component Properties and you can set the FileNameColumnName property there.


The second is a combination of the Properties window and the Flat File Source itself. Select the Flat File Source and go to the Properties window. There you specify the FileNameColumnName property but notice, the Flat File Source itself is put into a Warning state. To fix that, we need to double click on the component and view the Columns tab. You'll notice the name we specified in the Properties window is now set in the Columns tab and the warning goes away.


Cool story, bro

That's cool and all, but it has two downsides. The first is due to me being a dumb American but the file name that is added to the data flow is DT_WSTR/unicode/nvarchar with a length of 260. That is awesome for internationalization to default to it. Except the systems I work in never have nvarchar defined so now I will need to use a Data Conversion task to change the supplied name into a non-unicode version. That's an irritant but since I know what the pattern is, I can live with it.

The real downside with this approach is that it only works for Flat File Source. Excel, Raw File Source, and XML sources do not expose the FileNameColumnName property. Now that is a problem in my book because when I'm automating, I'd have to have one set of source patterns for flat files and a different one for non-flat files.

A better approach

So, as much as I like the built in solution, my pattern is to use a Derived Column to inject the file name into the Data Flow. I have a variable called CurrentFileName in all my packages. That contains the design-time path for my Flat File Connection Manager (or Excel). My Connection Manager will then have the ConnectionString/ExcelFilePath property assigned to be @[User::CurrentFileName]. This positions me for success because all I need to do is ensure that whatever mechanism I am using to determine my source file correctly populates that variable. In this post, a ForEach File Enumerator will handle that.

Within my Data Flow Task, I will add a Derived Column Transformation that adds my package variable into the data flow as a new column. Here, I am specifying it will be of data type DT_STR with a length of 130.

Biml

What would a post be without some Biml to illustrate the point?
I use the following as my source file.

Col1|Col2
1|2
2|3
3|4

This Biml will generate an SSIS package that has a Data Flow with a Flat File Source, a Derived Column and a Row Count. What you're interested in is seeing how we either specify the value for the FileNameColumnName in our FlatFileSource tag or enrich our data flow by adding it in our Derived Column component.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <FlatFileConnection FilePath="C:\ssisdata\SO\Input\CurrentFileName_0.txt" FileFormat="FFF_CurrentFileName" Name="FFCM" />
    </Connections>
    <FileFormats>
        <FlatFileFormat Name="FFF_CurrentFileName" IsUnicode="false" FlatFileType="Delimited" ColumnNamesInFirstDataRow="true">
            <Columns>
                <Column Name="Col1" DataType="Int32" Delimiter="|" />
                <Column Name="Col2" DataType="Int32" Delimiter="CRLF" />
            </Columns>
        </FlatFileFormat>
    </FileFormats>
    <Packages>
        <Package Name="CurrentFileName">
            <Connections>
                <Connection ConnectionName="FFCM">
                    <Expressions>
                        <Expression ExternalProperty="ConnectionString">@[User::CurrentFileName]</Expression>
                    </Expressions>
                </Connection>
            </Connections>
            <Variables>
                <Variable DataType="String" Name="CurrentFileName">C:\ssisdata\SO\Input\CurrentFileName_0.txt</Variable>
                <Variable DataType="Int32" Name="RowCountSource">0</Variable>
            </Variables>
            <Tasks>
                <ForEachFileLoop 
                    Folder="C:\ssisdata\SO\Input" 
                    FileSpecification="CurrentFileName*.txt" 
                    Name="FELC Shred txt">
                    <VariableMappings>
                        <VariableMapping VariableName="User.CurrentFileName" Name="0" />
                    </VariableMappings>
                    <Tasks>
                        <Dataflow Name="DFT Import data">
                            <Transformations>
                                <FlatFileSource ConnectionName="FFCM" Name="FFCM Pull data" FileNameColumnName="CurrentFileNameSource" />
                                <DerivedColumns Name="DER Add CurrentFileName">
                                    <Columns>
                                        <Column DataType="AnsiString" Name="CurrentFileName" Length="130">@[User::CurrentFileName]</Column>
                                    </Columns>
                                </DerivedColumns>
                                <RowCount VariableName="User.RowCountSource" Name="CNT Source Data" />
                            </Transformations>
                        </Dataflow>
                    </Tasks>
                </ForEachFileLoop>
            </Tasks>
        </Package>
    </Packages>
</Biml>

If all goes well, when you run you package you should see something like the following

Friday, April 22, 2016

Biml SSIS Foreach Nodelist Container (aka Shred XML in SSIS)

Biml SSIS Foreach Nodelist Container (aka Shred XML in SSIS)

Every time I look at my ForEach NodeList Enumerator post, I struggle to remember how to do half of it. Plus, as I discovered today, I am inconsistent between my images. The Control Flow shows an XML structure of

<Files>
    <File>Foo.txt</File>
    <File>Bar.txt</File>
    <File>Blee.txt</File>
</Files>
but the text inside the Foreach loop would be for parsing an XML tree in this format
<Files>
    <FileName>Foo.txt</FileName>
    <FileName>Bar.txt</FileName>
    <FileName>Blee.txt</FileName>
</Files>

The resulting package looks like this - a Foreach Enumerator that shreds the XML. We assign the shredded value into our variable CurrentNode. We pass that as an argument into a script task that does nothing but print the value as an Information event.

Configuration remains comparable to the previous post with the exception of using Variables.


Running the package generates Output like the following

Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Foo.txt
Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Bar.txt
Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Blee.txt

Biml

The following Biml will create a Foreach nodelist enumerator that shreds an XML recordset. We pass the current node variable into our script task that echoes the value back

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="Task_ForEachNodeListLoop">
            <Variables>
                <Variable DataType="String" Name="CurrentNode"></Variable>
                <Variable DataType="String" Name="SourceXML"><![CDATA[<Files><File>Foo.txt</File><File>Bar.txt</File><File>Blee.txt</File></Files>]]></Variable>
                <Variable DataType="String" Name="OuterXPath"><![CDATA[/Files/File]]></Variable>
                <Variable DataType="String" Name="InnerXPath"><![CDATA[.]]></Variable>
            </Variables>
            <Tasks>
                <ForEachNodeListLoop 
                    Name="FENLL Shred XML"
                    EnumerationType="ElementCollection"
                    InnerElementType="NodeText"
                    >
                    <VariableInput VariableName="User.SourceXML" />
                    <VariableOuterXPath VariableName="User.OuterXPath" />
                    <VariableInnerXPath VariableName="User.InnerXPath" />
                    <VariableMappings>
                        <VariableMapping VariableName="User.CurrentNode" Name="0" />
                    </VariableMappings>
                    <Tasks>
                        <Script ProjectCoreName="ST_EchoBack" Name="SCR Echo Back">
                            <ScriptTaskProjectReference ScriptTaskProjectName="ST_EchoBack" />
                        </Script>            
                    </Tasks>
                </ForEachNodeListLoop>
            </Tasks>
        </Package>
    </Packages>
    <ScriptProjects>
        <ScriptTaskProject ProjectCoreName="ST_EchoBack" Name="ST_EchoBack" VstaMajorVersion="0">
            <ReadOnlyVariables>
                <Variable Namespace="User" VariableName="CurrentNode" DataType="String" />
            </ReadOnlyVariables>
            <Files>
                <File Path="ScriptMain.cs" BuildAction="Compile">using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_EchoBack
{
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        public void Main()
        {
            bool fireAgain = false;
            string message = "{0}::{1} : {2}";
            foreach (var item in Dts.Variables)
            {
                Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }

        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
    }
}                </File>
                <File Path="Properties\AssemblyInfo.cs" BuildAction="Compile">
using System.Reflection;
using System.Runtime.CompilerServices;

[assembly: AssemblyVersion("1.0.*")]
                </File>
            </Files>
            <AssemblyReferences>
                <AssemblyReference AssemblyPath="System" />
                <AssemblyReference AssemblyPath="System.Data" />
                <AssemblyReference AssemblyPath="System.Windows.Forms" />
                <AssemblyReference AssemblyPath="System.Xml" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ManagedDTS.dll" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ScriptTask.dll" />
            </AssemblyReferences>
        </ScriptTaskProject>
    </ScriptProjects>
</Biml>

Thursday, April 21, 2016

Biml Script Task - Test for Echo

Biml Script Task

To aid in debugging, it's helpful to have a "flight recorder" running to show you the state of variables. When I was first learning to program, the debugger I used was a lot of PRINT statements. Verify your inputs before you assume anything is a distillation of my experience debugging.

While some favor using MessageBox, I hate finding the popup window, closing it and then promptly forgetting what value was displayed. In SSIS, I favor raising events, FireInformation specifically, to emit the value of variables. This approach allows me to see the values in both the Progress/Execution Results tab as well as the Output window.

The syntax is very simple, whatever you want recorded, you pass in as the description argument. In this snippet, I'm going to build a string with three items in item in it - the variable's NameSpace, Name and Value.

    bool fireAgain = false;
    string message = "{0}::{1} : {2}";
    var item = Dts.Variables["System::MachineName"];
    Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
When that code runs, it will pull the @[System::MachineName] variable out of the SSIS Variables collection and assign it to item. We'll then fire an information message off with the bits of data we care about.

It seems silly, doesn't it to print the variable's namespace and name there when we asked for it by name from the collection. Of course it is, a nicer, more generic snippet would be

    bool fireAgain = false;
    string message = "{0}::{1} : {2}";
    foreach (var item in Dts.Variables)
    {
        Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
    }
There, that's much better - however many Variables we pass in to the ReadOnly or ReadWrite collection will all get echoed back to the log and output window.

Biml

The following is almost turnkey, you simply need to specify all the variables you want emitted in the log there where it says "List all the variables you are interested in tracking". Follow the pattern and all the Variables you list will get tracked.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="so_36759813">
            <Variables>
                <Variable DataType="String" Name="echoBack">mixed</Variable>
            </Variables>
            <Tasks>
                <Script ProjectCoreName="ST_EchoBack" Name="SCR Echo Back">
                    <ScriptTaskProjectReference ScriptTaskProjectName="ST_EchoBack" />
                </Script>            
            </Tasks>
        </Package>
    </Packages>
    <ScriptProjects>
        <ScriptTaskProject ProjectCoreName="ST_EchoBack" Name="ST_EchoBack" VstaMajorVersion="0">
            <ReadOnlyVariables>
                <!-- List all the variables you are interested in tracking -->
                <Variable Namespace="System" VariableName="MachineName" DataType="String" />
            </ReadOnlyVariables>
            <Files>
                <File Path="ScriptMain.cs" BuildAction="Compile">using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_EchoBack
{
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        public void Main()
        {
            bool fireAgain = false;
            string message = "{0}::{1} : {2}";
            foreach (var item in Dts.Variables)
            {
                Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }

        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
    }
}                </File>
                <File Path="Properties\AssemblyInfo.cs" BuildAction="Compile">
using System.Reflection;
using System.Runtime.CompilerServices;

[assembly: AssemblyVersion("1.0.*")]
                </File>
            </Files>
            <AssemblyReferences>
                <AssemblyReference AssemblyPath="System" />
                <AssemblyReference AssemblyPath="System.Data" />
                <AssemblyReference AssemblyPath="System.Windows.Forms" />
                <AssemblyReference AssemblyPath="System.Xml" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ManagedDTS.dll" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ScriptTask.dll" />
            </AssemblyReferences>
        </ScriptTaskProject>
    </ScriptProjects>
</Biml>

We could make this better by using some BimlScript to inspect the package and add all the variables in scope in our ReadWrite list but that's deeper than we're going to go in this post.

Now, if you'll excuse me, it's time to spin Test For Echo.

Monday, April 4, 2016

ETL file processing pattern

ETL file processing pattern

In this post, I'd like to talk about what I always feel silly mentioning because it seems so elementary yet plenty of people stumble on it. The lesson is simple, be consistent. For file processing, the consistency I crave is where my files are going to be for processing. This post will have examples in SSIS but whether you're using a custom python script or informatica or pentaho kettle, the concept remains true: have your files in a consistent location.

You can tell when I've been somewhere because I really like the following file structure for ETL processing.

root node

All ETL processing will use a common root node/directory. I call it SSISData to make it fairly obvious what is in there but call it as you will. On my dev machine, this is usually sitting right off C:. On servers though, oh hell no! The C drive is reserved for the OS. Instead, I work with the DBAs to determine where this will actually be located. Heck, they could make it a UNC path and my processing won't care because I ensure that root location is an externally configurable "thing." Whatever you're using for ETL, I'm certain it will support the concept of configuration. Make sure the base node is configurable.

A nice thing about anchoring all your file processing to a head location is that if you are at an organization that is judicious with handing out file permissions, you don't have to create a permission request for each source of data. Get your security team to sign off on the ETL process having full control to a directory structure starting here. The number of SSIS related permissions issues I answer on StackOverflow is silly.

Disaster recovery! If you need to stand up a new processing site to simulate your current production environment, it'd be really convenient to only have to pull in one directory tree and say "we're ready."

I find it useful to make the root folder a network share as well so that whatever team is responsible for supporting the ETL process can immediately open files without having to RDP into a box just to check data. Make the share read-only because SOX, SAS70, etc.

subject area

Immediately under my root node, I have a subject area. If you're just beginning your ETL, this is a place people usually skip. "We'll commingle all the files in one folder, it's easier." Or, they don't even think about it because we gotta get this done.

Please, create subject areas to segment your files. Let's look at some reasons why you are likely going to want to have some isolation. "Data.csv" Great, is that the Sales Data or Employee Data? You may not know until you open it up because you're just receiving data with no control over what the original file is called. If you had work areas for your files, you'd be able to direct the data to be delivered to the correct location with no chance for one file to clobber another.

And while we're talking about processes you can't control, let's talk about how Marketing is just going to copy the file into the folder whenever it's ready. Creating a folder structure by subject area will allow you to better control folder permissions. Remember how I said open a share, if the process is that Marketing copies the file, give them write only access to the folder they need to deliver to. If everyone copies files to a common folder, how many curious eyes will want to open Salaries_2016.xlsx? Folders make a fine mechanism for separation of sensitive data.

Input

The first folder under a given subject area is called Input. If my process needs to find data, it need only look in the input folder. That's really about it, source data goes here.

Output

This is usually the least used of my folders but I ensure every subject area has an "Output" folder. This way, I always know where I'm going to write output files to. Output files might be immediately swept off to an FTP server or some other process consumes them but this folder is where I put the data and where I can control access to external consumers of my data. I've been in places where developers made the great agreement of "you supply the data as CSV and we'll generate an Excel file when we're done" Except the people picking up the files weren't terribly computer savvy and didn't have their file extensions turned on... Yeah, so have an output folder and dump your data there.

Archive

This is my kitchen utility drawer. I throw everything in here when I'm done with it. For inbound files, the pattern looks something like
foreach source file
  1. process data
  2. move file to archive folder (possibly with date processed on the name)

Outbound processing is identical. The core process completes and generates files. A new process fires off and delivers those files.
foreach outbound file

  1. deliver file
  2. move file to archive folder (possibly with date processed on the name)

Two things to call out with this approach. The first is if you rename the files to have a date stamp on them. That's great for having a trail of when you actually processed the data. For example, we'd get in BigFile.csv on New Year's day but due to year end processes running long, we didn't actually load the file until January Second. Thus, when it gets archived, we might tack on a processed date like BigFile_2016-01-02.csv On January 5th, bad things happen and we have to restore the database to January 1st. ETL processing is no problem, you just copy those files back into the Input folder and oh, we expect the file to be named BigFile.csv exactly. Now you have to manipulate the file name before you can reprocess data. That's a pain. Or if the process accepts a file mask, you'll end up with BigFile_2016-01-02_2016-01-05.csv in the Archive folder because now we have processed the file twice.

The second thing is to use your library methods for renaming files. Don't assume everything from the first period to the end of the file name is the file extension. Don't assume the file extension is 3 characters.

Do not archive your files with any date stamp that isn't in the form of yyyy-mm-dd. Month names sort horribly. When I'm looking for files, I'm looking by year, then month, then day. Do use a delimiter between year, month and day. I know, yyyymmdd is nicer in TSQL and such but for whatever reason, I find it harder to mentally parse in a file name.

SSIS file pattern

The following Biml expresses two different implementations of the same pattern. The first uses project and package level parameters in addition to SSIS variables. The Project level parameter expresses the root node. The package level parameter defines the Subject Area. Maybe Subject Area is promoted to project parameter, based on the size and scope of your work.

Within each package, we'll then use expressions to build a FolderBase which is RootNode + SubjectArea. We'll then use expressions to define FolderArchive, FolderInput, and FolderBase. I name them in this manner because I want them to sort into the same area in my Variables window. If you really like to get clever, define a namespace for your variables beyond User like "template" or "framework."

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Projects>
        <PackageProject Name="POC">
            <Parameters>
                <Parameter DataType="String" Name="FolderRoot">C:\ssisdata</Parameter>
            </Parameters>
            <Packages>
                <Package PackageName="FileProcessingParam"></Package>
            </Packages>
        </PackageProject>
    </Projects>
    <Packages>
        <Package Name="FileProcessingParam">
            <Parameters>
                <Parameter DataType="String" Name="SubjectArea">Sales</Parameter>
            </Parameters>
            <Variables>
                <Variable DataType="String" Name="FolderBase" EvaluateAsExpression="true">@[$Project::FolderRoot] + "\\" + @[$Package::SubjectArea]</Variable>
                <Variable DataType="String" Name="FolderArchive" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Archive"</Variable>
                <Variable DataType="String" Name="FolderInput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Input"</Variable>
                <Variable DataType="String" Name="FolderOutput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Output"</Variable>
            </Variables>
        </Package>
        <Package Name="FileProcessingClassic">
            <Variables>
                <Variable DataType="String" Name="FolderRoot">C:\ssisdata</Variable>
                <Variable DataType="String" Name="SubjectArea">Sales</Variable>
                <Variable DataType="String" Name="FolderBase" EvaluateAsExpression="true">@[User::FolderRoot] + "\\" + @[User::SubjectArea]</Variable>
                <Variable DataType="String" Name="FolderArchive" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Archive"</Variable>
                <Variable DataType="String" Name="FolderInput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Input"</Variable>
                <Variable DataType="String" Name="FolderOutput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Output"</Variable>
            </Variables>
        </Package>
    </Packages>
</Biml>
Using the above Biml generates two SSIS packages: FileProcessingParam which uses parameters as part of the 2012+ Project Deployment Model.

FileProcessingClassic is an approach that will work across all versions of SSIS from 2005 to 2016 whether you use the Project Deployment Model or the Package Deployment Model.

Take away

Create a consistent environment for all of your file based processing. It'll reduce the number of decisions junior developers need to make. It will lower the complexity of recreating your environment and all of its permissions as projects migrate through environments. It'll ensure everyone knows where the data should be in the event of an emergency or someone is on vacation. Finally, it simplifies your code because all I need to do to be successful is ensure the file lands in the right spot.

Monday, February 15, 2016

SSIS Conditional Processing by day

I'm working on a client where they have different business rules based on the day data is processed. On Monday, they generate a test balance for their accounting process. On Wednesday, the data is hardened and Friday they compute the final balances. Physically, it was implemented like this So, what's the problem? The problem is the precedence constraints. This is the constraint for the Friday branch DATEPART("DW", (DT_DBTIMESTAMP)GETDATE()) ==6 For those that don't read SSIS Expressions, we start inside the parentheses:
  1. Get the results of GETDATE()
  2. Cast that to a DT_DBTIMESTAMP
  3. Determine the day of the week, DW, for our expression
  4. Compare the results of all of that to 6
Do you see the problem? Really, there are two but the one I'm focused on is the use of GETDATE to determine which branch of logic is executed. Today is Monday and I need to test the logic that runs on Friday. Yes, I can run these steps in isolation and given that I'm not updating the logic that fiddles with the branches, my change shouldn't have an adverse effect but by golly, that sucks from an testing perspective. It's also really hard to develop unit tests when your input data is server date. What are you going to do, allocate 5 to 7 days for testing or change the server clock. I believe the answer is No and OH HELL NAH!

This isn't just an SSIS thing, either. I've seen the above logic in TSQL as well. If you pin your logic to getdate/current_timestamp calls, then your testing is going to be painful.

How do I fix this?

This is best left as an exercise to the reader based on their specific scenario but in general, I'd favor having a step establish a reference date for processing. In SSIS, it could be as simple as a Variable that is pegged to a value when the package begins that you could then override for testing purposes through a SET call to dtexec. Or you could be populating that from a query to a table. Or a Package/Project Variable and have the caller specify the day of the week. For the TSQL domain, the same mechanics could apply - initialize a variable and perform all your tests based on that authoritative date. Provide the ability to specify the date as a parameter to the procedure.

Something, anything really, just take a moment and ask yourself - How am I going to test this?

But, what was the other problem?
Looks around... We are not alone. There's a whole other country called Not-The-United-States, or something like that - geography was never my strong suit and damned if their first day of the week isn't Sunday. It doesn't even have to be a different country, someone might have set the server to use a different starting date value for the week (assuming TSQL).
SET LANGUAGE ENGLISH;
DECLARE
    -- 2016 February the 15th
    @SourceDate date = '20160215'

SELECT 
    @@LANGUAGE AS CurrentLanguage
,   @@DATEFIRST AS CurrentDateFirst
,   DATEPART(dw, @SourceDate) AS MondayDW;

SET LANGUAGE FRENCH;

SELECT 
    @@LANGUAGE AS CurrentLanguageFrench
,   @@DATEFIRST AS CurrentDateFirstFrench
,   DATEPART(dw, @SourceDate) AS MondayDWFrench;
That's not going to affect me though, right? I mean sure, we're moving toward a georedundant trans-bipolar-echolocation-Azure-Amazon cloud computing inframastructure but surely our computers will always be set to deal with my home country's default local, right?