World of Whatever

A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Loading...

Wednesday, May 25, 2016

SSIS - What is the name of the file

What is the name of the current file

It's common to not know the exact name of the file you're loading. You often have to apply some logic - a file mask, date calculation logic, etc to determine what file needs to be loaded. In SSIS, we often use a ForEach File Enumerator to accomplish this but an Expression task or even an Execute SQL Task can be used to retrieve/set a file name. Today, I'm going to show you two different mechanisms for identifying the current file name.

Did you know that in SSIS, the Flat File Source exposes an Advanced property called FileNameColumnName. This shows in your Properties window for the Flat File Connection Manager

There are two different click paths for setting the FileNameColumnName property. The first is to right click on the Flat File Source and select the "Show Advanced Editor" option. There, navigate to Component Properties and you can set the FileNameColumnName property there.


The second is a combination of the Properties window and the Flat File Source itself. Select the Flat File Source and go to the Properties window. There you specify the FileNameColumnName property but notice, the Flat File Source itself is put into a Warning state. To fix that, we need to double click on the component and view the Columns tab. You'll notice the name we specified in the Properties window is now set in the Columns tab and the warning goes away.


Cool story, bro

That's cool and all, but it has two downsides. The first is due to me being a dumb American but the file name that is added to the data flow is DT_WSTR/unicode/nvarchar with a length of 260. That is awesome for internationalization to default to it. Except the systems I work in never have nvarchar defined so now I will need to use a Data Conversion task to change the supplied name into a non-unicode version. That's an irritant but since I know what the pattern is, I can live with it.

The real downside with this approach is that it only works for Flat File Source. Excel, Raw File Source, and XML sources do not expose the FileNameColumnName property. Now that is a problem in my book because when I'm automating, I'd have to have one set of source patterns for flat files and a different one for non-flat files.

A better approach

So, as much as I like the built in solution, my pattern is to use a Derived Column to inject the file name into the Data Flow. I have a variable called CurrentFileName in all my packages. That contains the design-time path for my Flat File Connection Manager (or Excel). My Connection Manager will then have the ConnectionString/ExcelFilePath property assigned to be @[User::CurrentFileName]. This positions me for success because all I need to do is ensure that whatever mechanism I am using to determine my source file correctly populates that variable. In this post, a ForEach File Enumerator will handle that.

Within my Data Flow Task, I will add a Derived Column Transformation that adds my package variable into the data flow as a new column. Here, I am specifying it will be of data type DT_STR with a length of 130.

Biml

What would a post be without some Biml to illustrate the point?
I use the following as my source file.

Col1|Col2
1|2
2|3
3|4

This Biml will generate an SSIS package that has a Data Flow with a Flat File Source, a Derived Column and a Row Count. What you're interested in is seeing how we either specify the value for the FileNameColumnName in our FlatFileSource tag or enrich our data flow by adding it in our Derived Column component.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <FlatFileConnection FilePath="C:\ssisdata\SO\Input\CurrentFileName_0.txt" FileFormat="FFF_CurrentFileName" Name="FFCM" />
    </Connections>
    <FileFormats>
        <FlatFileFormat Name="FFF_CurrentFileName" IsUnicode="false" FlatFileType="Delimited" ColumnNamesInFirstDataRow="true">
            <Columns>
                <Column Name="Col1" DataType="Int32" Delimiter="|" />
                <Column Name="Col2" DataType="Int32" Delimiter="CRLF" />
            </Columns>
        </FlatFileFormat>
    </FileFormats>
    <Packages>
        <Package Name="CurrentFileName">
            <Connections>
                <Connection ConnectionName="FFCM">
                    <Expressions>
                        <Expression ExternalProperty="ConnectionString">@[User::CurrentFileName]</Expression>
                    </Expressions>
                </Connection>
            </Connections>
            <Variables>
                <Variable DataType="String" Name="CurrentFileName">C:\ssisdata\SO\Input\CurrentFileName_0.txt</Variable>
                <Variable DataType="Int32" Name="RowCountSource">0</Variable>
            </Variables>
            <Tasks>
                <ForEachFileLoop 
                    Folder="C:\ssisdata\SO\Input" 
                    FileSpecification="CurrentFileName*.txt" 
                    Name="FELC Shred txt">
                    <VariableMappings>
                        <VariableMapping VariableName="User.CurrentFileName" Name="0" />
                    </VariableMappings>
                    <Tasks>
                        <Dataflow Name="DFT Import data">
                            <Transformations>
                                <FlatFileSource ConnectionName="FFCM" Name="FFCM Pull data" FileNameColumnName="CurrentFileNameSource" />
                                <DerivedColumns Name="DER Add CurrentFileName">
                                    <Columns>
                                        <Column DataType="AnsiString" Name="CurrentFileName" Length="130">@[User::CurrentFileName]</Column>
                                    </Columns>
                                </DerivedColumns>
                                <RowCount VariableName="User.RowCountSource" Name="CNT Source Data" />
                            </Transformations>
                        </Dataflow>
                    </Tasks>
                </ForEachFileLoop>
            </Tasks>
        </Package>
    </Packages>
</Biml>

If all goes well, when you run you package you should see something like the following

Friday, April 22, 2016

Biml SSIS Foreach Nodelist Container (aka Shred XML in SSIS)

Biml SSIS Foreach Nodelist Container (aka Shred XML in SSIS)

Every time I look at my ForEach NodeList Enumerator post, I struggle to remember how to do half of it. Plus, as I discovered today, I am inconsistent between my images. The Control Flow shows an XML structure of

<Files>
    <File>Foo.txt</File>
    <File>Bar.txt</File>
    <File>Blee.txt</File>
</Files>
but the text inside the Foreach loop would be for parsing an XML tree in this format
<Files>
    <FileName>Foo.txt</FileName>
    <FileName>Bar.txt</FileName>
    <FileName>Blee.txt</FileName>
</Files>

The resulting package looks like this - a Foreach Enumerator that shreds the XML. We assign the shredded value into our variable CurrentNode. We pass that as an argument into a script task that does nothing but print the value as an Information event.

Configuration remains comparable to the previous post with the exception of using Variables.


Running the package generates Output like the following

Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Foo.txt
Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Bar.txt
Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Blee.txt

Biml

The following Biml will create a Foreach nodelist enumerator that shreds an XML recordset. We pass the current node variable into our script task that echoes the value back

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="Task_ForEachNodeListLoop">
            <Variables>
                <Variable DataType="String" Name="CurrentNode"></Variable>
                <Variable DataType="String" Name="SourceXML"><![CDATA[<Files><File>Foo.txt</File><File>Bar.txt</File><File>Blee.txt</File></Files>]]></Variable>
                <Variable DataType="String" Name="OuterXPath"><![CDATA[/Files/File]]></Variable>
                <Variable DataType="String" Name="InnerXPath"><![CDATA[.]]></Variable>
            </Variables>
            <Tasks>
                <ForEachNodeListLoop 
                    Name="FENLL Shred XML"
                    EnumerationType="ElementCollection"
                    InnerElementType="NodeText"
                    >
                    <VariableInput VariableName="User.SourceXML" />
                    <VariableOuterXPath VariableName="User.OuterXPath" />
                    <VariableInnerXPath VariableName="User.InnerXPath" />
                    <VariableMappings>
                        <VariableMapping VariableName="User.CurrentNode" Name="0" />
                    </VariableMappings>
                    <Tasks>
                        <Script ProjectCoreName="ST_EchoBack" Name="SCR Echo Back">
                            <ScriptTaskProjectReference ScriptTaskProjectName="ST_EchoBack" />
                        </Script>            
                    </Tasks>
                </ForEachNodeListLoop>
            </Tasks>
        </Package>
    </Packages>
    <ScriptProjects>
        <ScriptTaskProject ProjectCoreName="ST_EchoBack" Name="ST_EchoBack" VstaMajorVersion="0">
            <ReadOnlyVariables>
                <Variable Namespace="User" VariableName="CurrentNode" DataType="String" />
            </ReadOnlyVariables>
            <Files>
                <File Path="ScriptMain.cs" BuildAction="Compile">using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_EchoBack
{
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        public void Main()
        {
            bool fireAgain = false;
            string message = "{0}::{1} : {2}";
            foreach (var item in Dts.Variables)
            {
                Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }

        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
    }
}                </File>
                <File Path="Properties\AssemblyInfo.cs" BuildAction="Compile">
using System.Reflection;
using System.Runtime.CompilerServices;

[assembly: AssemblyVersion("1.0.*")]
                </File>
            </Files>
            <AssemblyReferences>
                <AssemblyReference AssemblyPath="System" />
                <AssemblyReference AssemblyPath="System.Data" />
                <AssemblyReference AssemblyPath="System.Windows.Forms" />
                <AssemblyReference AssemblyPath="System.Xml" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ManagedDTS.dll" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ScriptTask.dll" />
            </AssemblyReferences>
        </ScriptTaskProject>
    </ScriptProjects>
</Biml>

Thursday, April 21, 2016

Biml Script Task - Test for Echo

Biml Script Task

To aid in debugging, it's helpful to have a "flight recorder" running to show you the state of variables. When I was first learning to program, the debugger I used was a lot of PRINT statements. Verify your inputs before you assume anything is a distillation of my experience debugging.

While some favor using MessageBox, I hate finding the popup window, closing it and then promptly forgetting what value was displayed. In SSIS, I favor raising events, FireInformation specifically, to emit the value of variables. This approach allows me to see the values in both the Progress/Execution Results tab as well as the Output window.

The syntax is very simple, whatever you want recorded, you pass in as the description argument. In this snippet, I'm going to build a string with three items in item in it - the variable's NameSpace, Name and Value.

    bool fireAgain = false;
    string message = "{0}::{1} : {2}";
    var item = Dts.Variables["System::MachineName"];
    Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
When that code runs, it will pull the @[System::MachineName] variable out of the SSIS Variables collection and assign it to item. We'll then fire an information message off with the bits of data we care about.

It seems silly, doesn't it to print the variable's namespace and name there when we asked for it by name from the collection. Of course it is, a nicer, more generic snippet would be

    bool fireAgain = false;
    string message = "{0}::{1} : {2}";
    foreach (var item in Dts.Variables)
    {
        Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
    }
There, that's much better - however many Variables we pass in to the ReadOnly or ReadWrite collection will all get echoed back to the log and output window.

Biml

The following is almost turnkey, you simply need to specify all the variables you want emitted in the log there where it says "List all the variables you are interested in tracking". Follow the pattern and all the Variables you list will get tracked.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="so_36759813">
            <Variables>
                <Variable DataType="String" Name="echoBack">mixed</Variable>
            </Variables>
            <Tasks>
                <Script ProjectCoreName="ST_EchoBack" Name="SCR Echo Back">
                    <ScriptTaskProjectReference ScriptTaskProjectName="ST_EchoBack" />
                </Script>            
            </Tasks>
        </Package>
    </Packages>
    <ScriptProjects>
        <ScriptTaskProject ProjectCoreName="ST_EchoBack" Name="ST_EchoBack" VstaMajorVersion="0">
            <ReadOnlyVariables>
                <!-- List all the variables you are interested in tracking -->
                <Variable Namespace="System" VariableName="MachineName" DataType="String" />
            </ReadOnlyVariables>
            <Files>
                <File Path="ScriptMain.cs" BuildAction="Compile">using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_EchoBack
{
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        public void Main()
        {
            bool fireAgain = false;
            string message = "{0}::{1} : {2}";
            foreach (var item in Dts.Variables)
            {
                Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }

        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
    }
}                </File>
                <File Path="Properties\AssemblyInfo.cs" BuildAction="Compile">
using System.Reflection;
using System.Runtime.CompilerServices;

[assembly: AssemblyVersion("1.0.*")]
                </File>
            </Files>
            <AssemblyReferences>
                <AssemblyReference AssemblyPath="System" />
                <AssemblyReference AssemblyPath="System.Data" />
                <AssemblyReference AssemblyPath="System.Windows.Forms" />
                <AssemblyReference AssemblyPath="System.Xml" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ManagedDTS.dll" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ScriptTask.dll" />
            </AssemblyReferences>
        </ScriptTaskProject>
    </ScriptProjects>
</Biml>

We could make this better by using some BimlScript to inspect the package and add all the variables in scope in our ReadWrite list but that's deeper than we're going to go in this post.

Now, if you'll excuse me, it's time to spin Test For Echo.

Monday, April 4, 2016

ETL file processing pattern

ETL file processing pattern

In this post, I'd like to talk about what I always feel silly mentioning because it seems so elementary yet plenty of people stumble on it. The lesson is simple, be consistent. For file processing, the consistency I crave is where my files are going to be for processing. This post will have examples in SSIS but whether you're using a custom python script or informatica or pentaho kettle, the concept remains true: have your files in a consistent location.

You can tell when I've been somewhere because I really like the following file structure for ETL processing.

root node

All ETL processing will use a common root node/directory. I call it SSISData to make it fairly obvious what is in there but call it as you will. On my dev machine, this is usually sitting right off C:. On servers though, oh hell no! The C drive is reserved for the OS. Instead, I work with the DBAs to determine where this will actually be located. Heck, they could make it a UNC path and my processing won't care because I ensure that root location is an externally configurable "thing." Whatever you're using for ETL, I'm certain it will support the concept of configuration. Make sure the base node is configurable.

A nice thing about anchoring all your file processing to a head location is that if you are at an organization that is judicious with handing out file permissions, you don't have to create a permission request for each source of data. Get your security team to sign off on the ETL process having full control to a directory structure starting here. The number of SSIS related permissions issues I answer on StackOverflow is silly.

Disaster recovery! If you need to stand up a new processing site to simulate your current production environment, it'd be really convenient to only have to pull in one directory tree and say "we're ready."

I find it useful to make the root folder a network share as well so that whatever team is responsible for supporting the ETL process can immediately open files without having to RDP into a box just to check data. Make the share read-only because SOX, SAS70, etc.

subject area

Immediately under my root node, I have a subject area. If you're just beginning your ETL, this is a place people usually skip. "We'll commingle all the files in one folder, it's easier." Or, they don't even think about it because we gotta get this done.

Please, create subject areas to segment your files. Let's look at some reasons why you are likely going to want to have some isolation. "Data.csv" Great, is that the Sales Data or Employee Data? You may not know until you open it up because you're just receiving data with no control over what the original file is called. If you had work areas for your files, you'd be able to direct the data to be delivered to the correct location with no chance for one file to clobber another.

And while we're talking about processes you can't control, let's talk about how Marketing is just going to copy the file into the folder whenever it's ready. Creating a folder structure by subject area will allow you to better control folder permissions. Remember how I said open a share, if the process is that Marketing copies the file, give them write only access to the folder they need to deliver to. If everyone copies files to a common folder, how many curious eyes will want to open Salaries_2016.xlsx? Folders make a fine mechanism for separation of sensitive data.

Input

The first folder under a given subject area is called Input. If my process needs to find data, it need only look in the input folder. That's really about it, source data goes here.

Output

This is usually the least used of my folders but I ensure every subject area has an "Output" folder. This way, I always know where I'm going to write output files to. Output files might be immediately swept off to an FTP server or some other process consumes them but this folder is where I put the data and where I can control access to external consumers of my data. I've been in places where developers made the great agreement of "you supply the data as CSV and we'll generate an Excel file when we're done" Except the people picking up the files weren't terribly computer savvy and didn't have their file extensions turned on... Yeah, so have an output folder and dump your data there.

Archive

This is my kitchen utility drawer. I throw everything in here when I'm done with it. For inbound files, the pattern looks something like
foreach source file
  1. process data
  2. move file to archive folder (possibly with date processed on the name)

Outbound processing is identical. The core process completes and generates files. A new process fires off and delivers those files.
foreach outbound file

  1. deliver file
  2. move file to archive folder (possibly with date processed on the name)

Two things to call out with this approach. The first is if you rename the files to have a date stamp on them. That's great for having a trail of when you actually processed the data. For example, we'd get in BigFile.csv on New Year's day but due to year end processes running long, we didn't actually load the file until January Second. Thus, when it gets archived, we might tack on a processed date like BigFile_2016-01-02.csv On January 5th, bad things happen and we have to restore the database to January 1st. ETL processing is no problem, you just copy those files back into the Input folder and oh, we expect the file to be named BigFile.csv exactly. Now you have to manipulate the file name before you can reprocess data. That's a pain. Or if the process accepts a file mask, you'll end up with BigFile_2016-01-02_2016-01-05.csv in the Archive folder because now we have processed the file twice.

The second thing is to use your library methods for renaming files. Don't assume everything from the first period to the end of the file name is the file extension. Don't assume the file extension is 3 characters.

Do not archive your files with any date stamp that isn't in the form of yyyy-mm-dd. Month names sort horribly. When I'm looking for files, I'm looking by year, then month, then day. Do use a delimiter between year, month and day. I know, yyyymmdd is nicer in TSQL and such but for whatever reason, I find it harder to mentally parse in a file name.

SSIS file pattern

The following Biml expresses two different implementations of the same pattern. The first uses project and package level parameters in addition to SSIS variables. The Project level parameter expresses the root node. The package level parameter defines the Subject Area. Maybe Subject Area is promoted to project parameter, based on the size and scope of your work.

Within each package, we'll then use expressions to build a FolderBase which is RootNode + SubjectArea. We'll then use expressions to define FolderArchive, FolderInput, and FolderBase. I name them in this manner because I want them to sort into the same area in my Variables window. If you really like to get clever, define a namespace for your variables beyond User like "template" or "framework."

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Projects>
        <PackageProject Name="POC">
            <Parameters>
                <Parameter DataType="String" Name="FolderRoot">C:\ssisdata</Parameter>
            </Parameters>
            <Packages>
                <Package PackageName="FileProcessingParam"></Package>
            </Packages>
        </PackageProject>
    </Projects>
    <Packages>
        <Package Name="FileProcessingParam">
            <Parameters>
                <Parameter DataType="String" Name="SubjectArea">Sales</Parameter>
            </Parameters>
            <Variables>
                <Variable DataType="String" Name="FolderBase" EvaluateAsExpression="true">@[$Project::FolderRoot] + "\\" + @[$Package::SubjectArea]</Variable>
                <Variable DataType="String" Name="FolderArchive" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Archive"</Variable>
                <Variable DataType="String" Name="FolderInput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Input"</Variable>
                <Variable DataType="String" Name="FolderOutput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Output"</Variable>
            </Variables>
        </Package>
        <Package Name="FileProcessingClassic">
            <Variables>
                <Variable DataType="String" Name="FolderRoot">C:\ssisdata</Variable>
                <Variable DataType="String" Name="SubjectArea">Sales</Variable>
                <Variable DataType="String" Name="FolderBase" EvaluateAsExpression="true">@[User::FolderRoot] + "\\" + @[User::SubjectArea]</Variable>
                <Variable DataType="String" Name="FolderArchive" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Archive"</Variable>
                <Variable DataType="String" Name="FolderInput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Input"</Variable>
                <Variable DataType="String" Name="FolderOutput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Output"</Variable>
            </Variables>
        </Package>
    </Packages>
</Biml>
Using the above Biml generates two SSIS packages: FileProcessingParam which uses parameters as part of the 2012+ Project Deployment Model.

FileProcessingClassic is an approach that will work across all versions of SSIS from 2005 to 2016 whether you use the Project Deployment Model or the Package Deployment Model.

Take away

Create a consistent environment for all of your file based processing. It'll reduce the number of decisions junior developers need to make. It will lower the complexity of recreating your environment and all of its permissions as projects migrate through environments. It'll ensure everyone knows where the data should be in the event of an emergency or someone is on vacation. Finally, it simplifies your code because all I need to do to be successful is ensure the file lands in the right spot.

Monday, February 15, 2016

SSIS Conditional Processing by day

I'm working on a client where they have different business rules based on the day data is processed. On Monday, they generate a test balance for their accounting process. On Wednesday, the data is hardened and Friday they compute the final balances. Physically, it was implemented like this So, what's the problem? The problem is the precedence constraints. This is the constraint for the Friday branch DATEPART("DW", (DT_DBTIMESTAMP)GETDATE()) ==6 For those that don't read SSIS Expressions, we start inside the parentheses:
  1. Get the results of GETDATE()
  2. Cast that to a DT_DBTIMESTAMP
  3. Determine the day of the week, DW, for our expression
  4. Compare the results of all of that to 6
Do you see the problem? Really, there are two but the one I'm focused on is the use of GETDATE to determine which branch of logic is executed. Today is Monday and I need to test the logic that runs on Friday. Yes, I can run these steps in isolation and given that I'm not updating the logic that fiddles with the branches, my change shouldn't have an adverse effect but by golly, that sucks from an testing perspective. It's also really hard to develop unit tests when your input data is server date. What are you going to do, allocate 5 to 7 days for testing or change the server clock. I believe the answer is No and OH HELL NAH!

This isn't just an SSIS thing, either. I've seen the above logic in TSQL as well. If you pin your logic to getdate/current_timestamp calls, then your testing is going to be painful.

How do I fix this?

This is best left as an exercise to the reader based on their specific scenario but in general, I'd favor having a step establish a reference date for processing. In SSIS, it could be as simple as a Variable that is pegged to a value when the package begins that you could then override for testing purposes through a SET call to dtexec. Or you could be populating that from a query to a table. Or a Package/Project Variable and have the caller specify the day of the week. For the TSQL domain, the same mechanics could apply - initialize a variable and perform all your tests based on that authoritative date. Provide the ability to specify the date as a parameter to the procedure.

Something, anything really, just take a moment and ask yourself - How am I going to test this?

But, what was the other problem?
Looks around... We are not alone. There's a whole other country called Not-The-United-States, or something like that - geography was never my strong suit and damned if their first day of the week isn't Sunday. It doesn't even have to be a different country, someone might have set the server to use a different starting date value for the week (assuming TSQL).
SET LANGUAGE ENGLISH;
DECLARE
    -- 2016 February the 15th
    @SourceDate date = '20160215'

SELECT 
    @@LANGUAGE AS CurrentLanguage
,   @@DATEFIRST AS CurrentDateFirst
,   DATEPART(dw, @SourceDate) AS MondayDW;

SET LANGUAGE FRENCH;

SELECT 
    @@LANGUAGE AS CurrentLanguageFrench
,   @@DATEFIRST AS CurrentDateFirstFrench
,   DATEPART(dw, @SourceDate) AS MondayDWFrench;
That's not going to affect me though, right? I mean sure, we're moving toward a georedundant trans-bipolar-echolocation-Azure-Amazon cloud computing inframastructure but surely our computers will always be set to deal with my home country's default local, right?

Monday, October 26, 2015

Biml - Script Component Source

Biml - Script Component Source

What is the Biml to create an SSIS Script Component Source? This is a very simplistic demo but you'll see the magic is distilled to two sections - the first part is where we define the output buffer, lines 20-24. In this case, I specify it is DemoOutput and then provide a columns collection with a single column, SourceColumn.

The second set of magic is in the CreateNewOutputRows, lines 54 to 58. There I use the buffer I defined above to inject a single row into it with a value of "Demo". Nothing fancy, everything is static from a Biml perspective but I needed to know the syntax before I could try something a little more advanced.

Biml Demo Script Component Source

Using this a simple matter of adding a new Biml file into an existing SSIS project and pasting the following code. What results from right-clicking on the file and selecting Generate New SSIS package will be a single SSIS package, BasicScriptComponentSource, with a Data Flow task "DFT Demo Source Component"

The data flow "DFT Demo Source Component" consists of our new Script Component, SCR Demo Source, and a Derived Column, DER Placeholder, so you can attach a data viewer if need be.

Use the following Biml to generate your package and feel free to tell me in the comments how you adapted it to solve a "real" problem.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <ScriptProjects>
        <ScriptComponentProject Name="SC_Demo">
            <AssemblyReferences>
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.DTSPipelineWrap" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.DTSRuntimeWrap" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.PipelineHost" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.TxScript" />
                <AssemblyReference AssemblyPath="System.dll" />
                <AssemblyReference AssemblyPath="System.AddIn.dll" />
                <AssemblyReference AssemblyPath="System.Data.dll" />
                <AssemblyReference AssemblyPath="System.Xml.dll" />
            </AssemblyReferences>
            <OutputBuffers>
                <!--    
                Define what your buffer is called and what it looks like
                Must set IsSynchronous as false. Otherwise it is a transformation
                (one row enters, one row leaves) and not a source.
                -->
                <OutputBuffer Name="DemoOutput" IsSynchronous="false">
                    <Columns>
                        <Column Name="SourceColumn" DataType="String" Length="50" /> 
                    </Columns>                    
                </OutputBuffer>                                 
            </OutputBuffers>
            <Files>
                <File Path="Properties\AssemblyInfo.cs">
using System.Reflection;
using System.Runtime.CompilerServices;
[assembly: AssemblyTitle("SC_Demo")]
[assembly: AssemblyDescription("Demonstrate Script Component as source")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("billinkc")]
[assembly: AssemblyProduct("SC_Demo")]
[assembly: AssemblyCopyright("Copyright @ 2015")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]
[assembly: AssemblyVersion("1.0.*")]
</File>
<File Path="main.cs">
<![CDATA[
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

    /// &lt;summary&gt;
    /// Demonstrate how to generate a Script Component Source in SSIS
    /// &lt;/summary&gt;
    [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
    public class ScriptMain : UserComponent
    {

        public override void CreateNewOutputRows()
        {
            DemoOutputBuffer.AddRow();
            DemoOutputBuffer.SourceColumn = "Demo";
        }
    }
]]>
                </File>
            </Files>
        </ScriptComponentProject>
    </ScriptProjects>

    <Packages>
        <Package Name="BasicScriptComponentSource" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="DFT Demo Source Component">
                    <Transformations>
                        <ScriptComponentSource Name="SCR Demo Source">
                            <ScriptComponentProjectReference
                                ScriptComponentProjectName="SC_Demo">
                                
                            </ScriptComponentProjectReference>
                        </ScriptComponentSource>
                        <DerivedColumns Name="DER Placeholder" />
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>    
</Biml>

Tuesday, September 22, 2015

SSISDB Delete all packages and environments

SSISDB tear down script

For my Summit 2015 presentation, 2014 SSIS Project Deployment Model: Deployment and Maintenance, I needed to revise my SSISDB tear down script. When I first built it, it removed all the projects and then removed all the folders. Which was great but as I've noted elsewhere, a folder can contain Environments and those too will need to be accounted for. Otherwise, the catalog.delete_folder operation will fail as it is not empty.

Running the following code will remove everything in your SSISDB. This is the nuclear option so be certain you really want to clean house. You can uncomment the WHERE clause and selectively remove folders for a tactical nuclear option.

How it works is simple: I query catalog.folders to get a list of folders and then look in catalog.projects to find all the projects contained within the folder and delete those. I then repeat the process but look in catalog.environment to identify and remove all the SSIS environments.

USE [SSISDB]
GO

DECLARE
    @folder_name nvarchar(128)
,   @project_name nvarchar(128)
,   @environment_name nvarchar(128);

DECLARE Csr CURSOR
READ_ONLY FOR 
SELECT
    CF.name AS folder_name
FROM
    catalog.folders AS CF
--WHERE
--    CF.name IN ('');
;

OPEN Csr;
FETCH NEXT FROM Csr INTO
    @folder_name;
WHILE (@@fetch_status <> -1)
BEGIN
    IF (@@fetch_status <> -2)
    BEGIN

        -------------------------------------------------------------
        -- Drop any projects
        -------------------------------------------------------------
        DECLARE FCsr CURSOR
        READ_ONLY FOR 
        SELECT
            CP.name AS project_name 
        FROM
            catalog.projects AS CP
            INNER JOIN
                catalog.folders AS CF
                ON CF.folder_id = CP.folder_id
        WHERE
            CF.name = @folder_name;

        OPEN FCsr;
        FETCH NEXT FROM FCsr INTO
            @project_name;
        WHILE(@@FETCH_STATUS = 0)
        BEGIN
            EXECUTE catalog.delete_project
                @folder_name
            ,   @project_name;

            FETCH NEXT FROM FCsr INTO
                @project_name;
        END
        CLOSE FCsr;
        DEALLOCATE FCsr;

        -------------------------------------------------------------
        -- Drop any environments
        -------------------------------------------------------------
        DECLARE ECsr CURSOR
        READ_ONLY FOR 
        SELECT
            E.name AS project_name 
        FROM
            catalog.environments AS E 
            INNER JOIN
                catalog.folders AS CF
                ON CF.folder_id = E.folder_id
        WHERE
            CF.name = @folder_name;

        OPEN ECsr;
        FETCH NEXT FROM ECsr INTO
            @environment_name;
        WHILE(@@FETCH_STATUS = 0)
        BEGIN
            EXECUTE catalog.delete_environment             
                @folder_name
            ,   @environment_name;

            FETCH NEXT FROM ECsr INTO
                @environment_name;
        END
        CLOSE ECsr;
        DEALLOCATE ECsr;

        -------------------------------------------------------------
        -- Finally, remove the folder
        -------------------------------------------------------------
        EXECUTE [catalog].[delete_folder]
            @folder_name;

    END
    FETCH NEXT FROM Csr INTO
        @folder_name;

END

CLOSE Csr;
DEALLOCATE Csr;

Caveat

The one thing I haven't investigated yet was cross folder dependencies. Imagine folders Configurations and Projects. Configurations has an Environment called Settings. Projects has a project called AWShoppingCart which then has a reference to the environment Settings. I expect I will be able to delete the Configurations folder and the environment just fine and it will just leave the project AWShoppingCart broken until I reconfigure it. But, the environment delete operation could just as easily fail if there's reference count is non-zero.