World of Whatever

A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Loading...

Friday, April 22, 2016

Biml SSIS Foreach Nodelist Container (aka Shred XML in SSIS)

Biml SSIS Foreach Nodelist Container (aka Shred XML in SSIS)

Every time I look at my ForEach NodeList Enumerator post, I struggle to remember how to do half of it. Plus, as I discovered today, I am inconsistent between my images. The Control Flow shows an XML structure of

<Files>
    <File>Foo.txt</File>
    <File>Bar.txt</File>
    <File>Blee.txt</File>
</Files>
but the text inside the Foreach loop would be for parsing an XML tree in this format
<Files>
    <FileName>Foo.txt</FileName>
    <FileName>Bar.txt</FileName>
    <FileName>Blee.txt</FileName>
</Files>

The resulting package looks like this - a Foreach Enumerator that shreds the XML. We assign the shredded value into our variable CurrentNode. We pass that as an argument into a script task that does nothing but print the value as an Information event.

Configuration remains comparable to the previous post with the exception of using Variables.


Running the package generates Output like the following

Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Foo.txt
Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Bar.txt
Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Blee.txt

Biml

The following Biml will create a Foreach nodelist enumerator that shreds an XML recordset. We pass the current node variable into our script task that echoes the value back

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="Task_ForEachNodeListLoop">
            <Variables>
                <Variable DataType="String" Name="CurrentNode"></Variable>
                <Variable DataType="String" Name="SourceXML"><![CDATA[<Files><File>Foo.txt</File><File>Bar.txt</File><File>Blee.txt</File></Files>]]></Variable>
                <Variable DataType="String" Name="OuterXPath"><![CDATA[/Files/File]]></Variable>
                <Variable DataType="String" Name="InnerXPath"><![CDATA[.]]></Variable>
            </Variables>
            <Tasks>
                <ForEachNodeListLoop 
                    Name="FENLL Shred XML"
                    EnumerationType="ElementCollection"
                    InnerElementType="NodeText"
                    >
                    <VariableInput VariableName="User.SourceXML" />
                    <VariableOuterXPath VariableName="User.OuterXPath" />
                    <VariableInnerXPath VariableName="User.InnerXPath" />
                    <VariableMappings>
                        <VariableMapping VariableName="User.CurrentNode" Name="0" />
                    </VariableMappings>
                    <Tasks>
                        <Script ProjectCoreName="ST_EchoBack" Name="SCR Echo Back">
                            <ScriptTaskProjectReference ScriptTaskProjectName="ST_EchoBack" />
                        </Script>            
                    </Tasks>
                </ForEachNodeListLoop>
            </Tasks>
        </Package>
    </Packages>
    <ScriptProjects>
        <ScriptTaskProject ProjectCoreName="ST_EchoBack" Name="ST_EchoBack" VstaMajorVersion="0">
            <ReadOnlyVariables>
                <Variable Namespace="User" VariableName="CurrentNode" DataType="String" />
            </ReadOnlyVariables>
            <Files>
                <File Path="ScriptMain.cs" BuildAction="Compile">using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_EchoBack
{
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        public void Main()
        {
            bool fireAgain = false;
            string message = "{0}::{1} : {2}";
            foreach (var item in Dts.Variables)
            {
                Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }

        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
    }
}                </File>
                <File Path="Properties\AssemblyInfo.cs" BuildAction="Compile">
using System.Reflection;
using System.Runtime.CompilerServices;

[assembly: AssemblyVersion("1.0.*")]
                </File>
            </Files>
            <AssemblyReferences>
                <AssemblyReference AssemblyPath="System" />
                <AssemblyReference AssemblyPath="System.Data" />
                <AssemblyReference AssemblyPath="System.Windows.Forms" />
                <AssemblyReference AssemblyPath="System.Xml" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ManagedDTS.dll" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ScriptTask.dll" />
            </AssemblyReferences>
        </ScriptTaskProject>
    </ScriptProjects>
</Biml>

Thursday, April 21, 2016

Biml Script Task - Test for Echo

Biml Script Task

To aid in debugging, it's helpful to have a "flight recorder" running to show you the state of variables. When I was first learning to program, the debugger I used was a lot of PRINT statements. Verify your inputs before you assume anything is a distillation of my experience debugging.

While some favor using MessageBox, I hate finding the popup window, closing it and then promptly forgetting what value was displayed. In SSIS, I favor raising events, FireInformation specifically, to emit the value of variables. This approach allows me to see the values in both the Progress/Execution Results tab as well as the Output window.

The syntax is very simple, whatever you want recorded, you pass in as the description argument. In this snippet, I'm going to build a string with three items in item in it - the variable's NameSpace, Name and Value.

    bool fireAgain = false;
    string message = "{0}::{1} : {2}";
    var item = Dts.Variables["System::MachineName"];
    Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
When that code runs, it will pull the @[System::MachineName] variable out of the SSIS Variables collection and assign it to item. We'll then fire an information message off with the bits of data we care about.

It seems silly, doesn't it to print the variable's namespace and name there when we asked for it by name from the collection. Of course it is, a nicer, more generic snippet would be

    bool fireAgain = false;
    string message = "{0}::{1} : {2}";
    foreach (var item in Dts.Variables)
    {
        Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
    }
There, that's much better - however many Variables we pass in to the ReadOnly or ReadWrite collection will all get echoed back to the log and output window.

Biml

The following is almost turnkey, you simply need to specify all the variables you want emitted in the log there where it says "List all the variables you are interested in tracking". Follow the pattern and all the Variables you list will get tracked.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="so_36759813">
            <Variables>
                <Variable DataType="String" Name="echoBack">mixed</Variable>
            </Variables>
            <Tasks>
                <Script ProjectCoreName="ST_EchoBack" Name="SCR Echo Back">
                    <ScriptTaskProjectReference ScriptTaskProjectName="ST_EchoBack" />
                </Script>            
            </Tasks>
        </Package>
    </Packages>
    <ScriptProjects>
        <ScriptTaskProject ProjectCoreName="ST_EchoBack" Name="ST_EchoBack" VstaMajorVersion="0">
            <ReadOnlyVariables>
                <!-- List all the variables you are interested in tracking -->
                <Variable Namespace="System" VariableName="MachineName" DataType="String" />
            </ReadOnlyVariables>
            <Files>
                <File Path="ScriptMain.cs" BuildAction="Compile">using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_EchoBack
{
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        public void Main()
        {
            bool fireAgain = false;
            string message = "{0}::{1} : {2}";
            foreach (var item in Dts.Variables)
            {
                Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }

        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
    }
}                </File>
                <File Path="Properties\AssemblyInfo.cs" BuildAction="Compile">
using System.Reflection;
using System.Runtime.CompilerServices;

[assembly: AssemblyVersion("1.0.*")]
                </File>
            </Files>
            <AssemblyReferences>
                <AssemblyReference AssemblyPath="System" />
                <AssemblyReference AssemblyPath="System.Data" />
                <AssemblyReference AssemblyPath="System.Windows.Forms" />
                <AssemblyReference AssemblyPath="System.Xml" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ManagedDTS.dll" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ScriptTask.dll" />
            </AssemblyReferences>
        </ScriptTaskProject>
    </ScriptProjects>
</Biml>

We could make this better by using some BimlScript to inspect the package and add all the variables in scope in our ReadWrite list but that's deeper than we're going to go in this post.

Now, if you'll excuse me, it's time to spin Test For Echo.

Monday, April 4, 2016

ETL file processing pattern

ETL file processing pattern

In this post, I'd like to talk about what I always feel silly mentioning because it seems so elementary yet plenty of people stumble on it. The lesson is simple, be consistent. For file processing, the consistency I crave is where my files are going to be for processing. This post will have examples in SSIS but whether you're using a custom python script or informatica or pentaho kettle, the concept remains true: have your files in a consistent location.

You can tell when I've been somewhere because I really like the following file structure for ETL processing.

root node

All ETL processing will use a common root node/directory. I call it SSISData to make it fairly obvious what is in there but call it as you will. On my dev machine, this is usually sitting right off C:. On servers though, oh hell no! The C drive is reserved for the OS. Instead, I work with the DBAs to determine where this will actually be located. Heck, they could make it a UNC path and my processing won't care because I ensure that root location is an externally configurable "thing." Whatever you're using for ETL, I'm certain it will support the concept of configuration. Make sure the base node is configurable.

A nice thing about anchoring all your file processing to a head location is that if you are at an organization that is judicious with handing out file permissions, you don't have to create a permission request for each source of data. Get your security team to sign off on the ETL process having full control to a directory structure starting here. The number of SSIS related permissions issues I answer on StackOverflow is silly.

Disaster recovery! If you need to stand up a new processing site to simulate your current production environment, it'd be really convenient to only have to pull in one directory tree and say "we're ready."

I find it useful to make the root folder a network share as well so that whatever team is responsible for supporting the ETL process can immediately open files without having to RDP into a box just to check data. Make the share read-only because SOX, SAS70, etc.

subject area

Immediately under my root node, I have a subject area. If you're just beginning your ETL, this is a place people usually skip. "We'll commingle all the files in one folder, it's easier." Or, they don't even think about it because we gotta get this done.

Please, create subject areas to segment your files. Let's look at some reasons why you are likely going to want to have some isolation. "Data.csv" Great, is that the Sales Data or Employee Data? You may not know until you open it up because you're just receiving data with no control over what the original file is called. If you had work areas for your files, you'd be able to direct the data to be delivered to the correct location with no chance for one file to clobber another.

And while we're talking about processes you can't control, let's talk about how Marketing is just going to copy the file into the folder whenever it's ready. Creating a folder structure by subject area will allow you to better control folder permissions. Remember how I said open a share, if the process is that Marketing copies the file, give them write only access to the folder they need to deliver to. If everyone copies files to a common folder, how many curious eyes will want to open Salaries_2016.xlsx? Folders make a fine mechanism for separation of sensitive data.

Input

The first folder under a given subject area is called Input. If my process needs to find data, it need only look in the input folder. That's really about it, source data goes here.

Output

This is usually the least used of my folders but I ensure every subject area has an "Output" folder. This way, I always know where I'm going to write output files to. Output files might be immediately swept off to an FTP server or some other process consumes them but this folder is where I put the data and where I can control access to external consumers of my data. I've been in places where developers made the great agreement of "you supply the data as CSV and we'll generate an Excel file when we're done" Except the people picking up the files weren't terribly computer savvy and didn't have their file extensions turned on... Yeah, so have an output folder and dump your data there.

Archive

This is my kitchen utility drawer. I throw everything in here when I'm done with it. For inbound files, the pattern looks something like
foreach source file
  1. process data
  2. move file to archive folder (possibly with date processed on the name)

Outbound processing is identical. The core process completes and generates files. A new process fires off and delivers those files.
foreach outbound file

  1. deliver file
  2. move file to archive folder (possibly with date processed on the name)

Two things to call out with this approach. The first is if you rename the files to have a date stamp on them. That's great for having a trail of when you actually processed the data. For example, we'd get in BigFile.csv on New Year's day but due to year end processes running long, we didn't actually load the file until January Second. Thus, when it gets archived, we might tack on a processed date like BigFile_2016-01-02.csv On January 5th, bad things happen and we have to restore the database to January 1st. ETL processing is no problem, you just copy those files back into the Input folder and oh, we expect the file to be named BigFile.csv exactly. Now you have to manipulate the file name before you can reprocess data. That's a pain. Or if the process accepts a file mask, you'll end up with BigFile_2016-01-02_2016-01-05.csv in the Archive folder because now we have processed the file twice.

The second thing is to use your library methods for renaming files. Don't assume everything from the first period to the end of the file name is the file extension. Don't assume the file extension is 3 characters.

Do not archive your files with any date stamp that isn't in the form of yyyy-mm-dd. Month names sort horribly. When I'm looking for files, I'm looking by year, then month, then day. Do use a delimiter between year, month and day. I know, yyyymmdd is nicer in TSQL and such but for whatever reason, I find it harder to mentally parse in a file name.

SSIS file pattern

The following Biml expresses two different implementations of the same pattern. The first uses project and package level parameters in addition to SSIS variables. The Project level parameter expresses the root node. The package level parameter defines the Subject Area. Maybe Subject Area is promoted to project parameter, based on the size and scope of your work.

Within each package, we'll then use expressions to build a FolderBase which is RootNode + SubjectArea. We'll then use expressions to define FolderArchive, FolderInput, and FolderBase. I name them in this manner because I want them to sort into the same area in my Variables window. If you really like to get clever, define a namespace for your variables beyond User like "template" or "framework."

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Projects>
        <PackageProject Name="POC">
            <Parameters>
                <Parameter DataType="String" Name="FolderRoot">C:\ssisdata</Parameter>
            </Parameters>
            <Packages>
                <Package PackageName="FileProcessingParam"></Package>
            </Packages>
        </PackageProject>
    </Projects>
    <Packages>
        <Package Name="FileProcessingParam">
            <Parameters>
                <Parameter DataType="String" Name="SubjectArea">Sales</Parameter>
            </Parameters>
            <Variables>
                <Variable DataType="String" Name="FolderBase" EvaluateAsExpression="true">@[$Project::FolderRoot] + "\\" + @[$Package::SubjectArea]</Variable>
                <Variable DataType="String" Name="FolderArchive" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Archive"</Variable>
                <Variable DataType="String" Name="FolderInput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Input"</Variable>
                <Variable DataType="String" Name="FolderOutput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Output"</Variable>
            </Variables>
        </Package>
        <Package Name="FileProcessingClassic">
            <Variables>
                <Variable DataType="String" Name="FolderRoot">C:\ssisdata</Variable>
                <Variable DataType="String" Name="SubjectArea">Sales</Variable>
                <Variable DataType="String" Name="FolderBase" EvaluateAsExpression="true">@[User::FolderRoot] + "\\" + @[User::SubjectArea]</Variable>
                <Variable DataType="String" Name="FolderArchive" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Archive"</Variable>
                <Variable DataType="String" Name="FolderInput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Input"</Variable>
                <Variable DataType="String" Name="FolderOutput" EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Output"</Variable>
            </Variables>
        </Package>
    </Packages>
</Biml>
Using the above Biml generates two SSIS packages: FileProcessingParam which uses parameters as part of the 2012+ Project Deployment Model.

FileProcessingClassic is an approach that will work across all versions of SSIS from 2005 to 2016 whether you use the Project Deployment Model or the Package Deployment Model.

Take away

Create a consistent environment for all of your file based processing. It'll reduce the number of decisions junior developers need to make. It will lower the complexity of recreating your environment and all of its permissions as projects migrate through environments. It'll ensure everyone knows where the data should be in the event of an emergency or someone is on vacation. Finally, it simplifies your code because all I need to do to be successful is ensure the file lands in the right spot.

Monday, February 15, 2016

SSIS Conditional Processing by day

I'm working on a client where they have different business rules based on the day data is processed. On Monday, they generate a test balance for their accounting process. On Wednesday, the data is hardened and Friday they compute the final balances. Physically, it was implemented like this So, what's the problem? The problem is the precedence constraints. This is the constraint for the Friday branch DATEPART("DW", (DT_DBTIMESTAMP)GETDATE()) ==6 For those that don't read SSIS Expressions, we start inside the parentheses:
  1. Get the results of GETDATE()
  2. Cast that to a DT_DBTIMESTAMP
  3. Determine the day of the week, DW, for our expression
  4. Compare the results of all of that to 6
Do you see the problem? Really, there are two but the one I'm focused on is the use of GETDATE to determine which branch of logic is executed. Today is Monday and I need to test the logic that runs on Friday. Yes, I can run these steps in isolation and given that I'm not updating the logic that fiddles with the branches, my change shouldn't have an adverse effect but by golly, that sucks from an testing perspective. It's also really hard to develop unit tests when your input data is server date. What are you going to do, allocate 5 to 7 days for testing or change the server clock. I believe the answer is No and OH HELL NAH!

This isn't just an SSIS thing, either. I've seen the above logic in TSQL as well. If you pin your logic to getdate/current_timestamp calls, then your testing is going to be painful.

How do I fix this?

This is best left as an exercise to the reader based on their specific scenario but in general, I'd favor having a step establish a reference date for processing. In SSIS, it could be as simple as a Variable that is pegged to a value when the package begins that you could then override for testing purposes through a SET call to dtexec. Or you could be populating that from a query to a table. Or a Package/Project Variable and have the caller specify the day of the week. For the TSQL domain, the same mechanics could apply - initialize a variable and perform all your tests based on that authoritative date. Provide the ability to specify the date as a parameter to the procedure.

Something, anything really, just take a moment and ask yourself - How am I going to test this?

But, what was the other problem?
Looks around... We are not alone. There's a whole other country called Not-The-United-States, or something like that - geography was never my strong suit and damned if their first day of the week isn't Sunday. It doesn't even have to be a different country, someone might have set the server to use a different starting date value for the week (assuming TSQL).
SET LANGUAGE ENGLISH;
DECLARE
    -- 2016 February the 15th
    @SourceDate date = '20160215'

SELECT 
    @@LANGUAGE AS CurrentLanguage
,   @@DATEFIRST AS CurrentDateFirst
,   DATEPART(dw, @SourceDate) AS MondayDW;

SET LANGUAGE FRENCH;

SELECT 
    @@LANGUAGE AS CurrentLanguageFrench
,   @@DATEFIRST AS CurrentDateFirstFrench
,   DATEPART(dw, @SourceDate) AS MondayDWFrench;
That's not going to affect me though, right? I mean sure, we're moving toward a georedundant trans-bipolar-echolocation-Azure-Amazon cloud computing inframastructure but surely our computers will always be set to deal with my home country's default local, right?

Monday, October 26, 2015

Biml - Script Component Source

Biml - Script Component Source

What is the Biml to create an SSIS Script Component Source? This is a very simplistic demo but you'll see the magic is distilled to two sections - the first part is where we define the output buffer, lines 20-24. In this case, I specify it is DemoOutput and then provide a columns collection with a single column, SourceColumn.

The second set of magic is in the CreateNewOutputRows, lines 54 to 58. There I use the buffer I defined above to inject a single row into it with a value of "Demo". Nothing fancy, everything is static from a Biml perspective but I needed to know the syntax before I could try something a little more advanced.

Biml Demo Script Component Source

Using this a simple matter of adding a new Biml file into an existing SSIS project and pasting the following code. What results from right-clicking on the file and selecting Generate New SSIS package will be a single SSIS package, BasicScriptComponentSource, with a Data Flow task "DFT Demo Source Component"

The data flow "DFT Demo Source Component" consists of our new Script Component, SCR Demo Source, and a Derived Column, DER Placeholder, so you can attach a data viewer if need be.

Use the following Biml to generate your package and feel free to tell me in the comments how you adapted it to solve a "real" problem.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <ScriptProjects>
        <ScriptComponentProject Name="SC_Demo">
            <AssemblyReferences>
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.DTSPipelineWrap" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.DTSRuntimeWrap" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.PipelineHost" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.TxScript" />
                <AssemblyReference AssemblyPath="System.dll" />
                <AssemblyReference AssemblyPath="System.AddIn.dll" />
                <AssemblyReference AssemblyPath="System.Data.dll" />
                <AssemblyReference AssemblyPath="System.Xml.dll" />
            </AssemblyReferences>
            <OutputBuffers>
                <!--    
                Define what your buffer is called and what it looks like
                Must set IsSynchronous as false. Otherwise it is a transformation
                (one row enters, one row leaves) and not a source.
                -->
                <OutputBuffer Name="DemoOutput" IsSynchronous="false">
                    <Columns>
                        <Column Name="SourceColumn" DataType="String" Length="50" /> 
                    </Columns>                    
                </OutputBuffer>                                 
            </OutputBuffers>
            <Files>
                <File Path="Properties\AssemblyInfo.cs">
using System.Reflection;
using System.Runtime.CompilerServices;
[assembly: AssemblyTitle("SC_Demo")]
[assembly: AssemblyDescription("Demonstrate Script Component as source")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("billinkc")]
[assembly: AssemblyProduct("SC_Demo")]
[assembly: AssemblyCopyright("Copyright @ 2015")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]
[assembly: AssemblyVersion("1.0.*")]
</File>
<File Path="main.cs">
<![CDATA[
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

    /// &lt;summary&gt;
    /// Demonstrate how to generate a Script Component Source in SSIS
    /// &lt;/summary&gt;
    [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
    public class ScriptMain : UserComponent
    {

        public override void CreateNewOutputRows()
        {
            DemoOutputBuffer.AddRow();
            DemoOutputBuffer.SourceColumn = "Demo";
        }
    }
]]>
                </File>
            </Files>
        </ScriptComponentProject>
    </ScriptProjects>

    <Packages>
        <Package Name="BasicScriptComponentSource" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="DFT Demo Source Component">
                    <Transformations>
                        <ScriptComponentSource Name="SCR Demo Source">
                            <ScriptComponentProjectReference
                                ScriptComponentProjectName="SC_Demo">
                                
                            </ScriptComponentProjectReference>
                        </ScriptComponentSource>
                        <DerivedColumns Name="DER Placeholder" />
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>    
</Biml>

Tuesday, September 22, 2015

SSISDB Delete all packages and environments

SSISDB tear down script

For my Summit 2015 presentation, 2014 SSIS Project Deployment Model: Deployment and Maintenance, I needed to revise my SSISDB tear down script. When I first built it, it removed all the projects and then removed all the folders. Which was great but as I've noted elsewhere, a folder can contain Environments and those too will need to be accounted for. Otherwise, the catalog.delete_folder operation will fail as it is not empty.

Running the following code will remove everything in your SSISDB. This is the nuclear option so be certain you really want to clean house. You can uncomment the WHERE clause and selectively remove folders for a tactical nuclear option.

How it works is simple: I query catalog.folders to get a list of folders and then look in catalog.projects to find all the projects contained within the folder and delete those. I then repeat the process but look in catalog.environment to identify and remove all the SSIS environments.

USE [SSISDB]
GO

DECLARE
    @folder_name nvarchar(128)
,   @project_name nvarchar(128)
,   @environment_name nvarchar(128);

DECLARE Csr CURSOR
READ_ONLY FOR 
SELECT
    CF.name AS folder_name
FROM
    catalog.folders AS CF
--WHERE
--    CF.name IN ('');
;

OPEN Csr;
FETCH NEXT FROM Csr INTO
    @folder_name;
WHILE (@@fetch_status <> -1)
BEGIN
    IF (@@fetch_status <> -2)
    BEGIN

        -------------------------------------------------------------
        -- Drop any projects
        -------------------------------------------------------------
        DECLARE FCsr CURSOR
        READ_ONLY FOR 
        SELECT
            CP.name AS project_name 
        FROM
            catalog.projects AS CP
            INNER JOIN
                catalog.folders AS CF
                ON CF.folder_id = CP.folder_id
        WHERE
            CF.name = @folder_name;

        OPEN FCsr;
        FETCH NEXT FROM FCsr INTO
            @project_name;
        WHILE(@@FETCH_STATUS = 0)
        BEGIN
            EXECUTE catalog.delete_project
                @folder_name
            ,   @project_name;

            FETCH NEXT FROM FCsr INTO
                @project_name;
        END
        CLOSE FCsr;
        DEALLOCATE FCsr;

        -------------------------------------------------------------
        -- Drop any environments
        -------------------------------------------------------------
        DECLARE ECsr CURSOR
        READ_ONLY FOR 
        SELECT
            E.name AS project_name 
        FROM
            catalog.environments AS E 
            INNER JOIN
                catalog.folders AS CF
                ON CF.folder_id = E.folder_id
        WHERE
            CF.name = @folder_name;

        OPEN ECsr;
        FETCH NEXT FROM ECsr INTO
            @environment_name;
        WHILE(@@FETCH_STATUS = 0)
        BEGIN
            EXECUTE catalog.delete_environment             
                @folder_name
            ,   @environment_name;

            FETCH NEXT FROM ECsr INTO
                @environment_name;
        END
        CLOSE ECsr;
        DEALLOCATE ECsr;

        -------------------------------------------------------------
        -- Finally, remove the folder
        -------------------------------------------------------------
        EXECUTE [catalog].[delete_folder]
            @folder_name;

    END
    FETCH NEXT FROM Csr INTO
        @folder_name;

END

CLOSE Csr;
DEALLOCATE Csr;

Caveat

The one thing I haven't investigated yet was cross folder dependencies. Imagine folders Configurations and Projects. Configurations has an Environment called Settings. Projects has a project called AWShoppingCart which then has a reference to the environment Settings. I expect I will be able to delete the Configurations folder and the environment just fine and it will just leave the project AWShoppingCart broken until I reconfigure it. But, the environment delete operation could just as easily fail if there's reference count is non-zero.

Monday, September 21, 2015

Testing SQL Server Alerts

a.k.a. why the heck aren't my Alerts alerting?

I'm learning Power BI by building a sample dashboard for database operational analytics. One of the metrics we want to track was whether any SQL Alerts had fired in the reporting time frame.

Seems reasonable enough so I defined a few alerts on my machine and did some dumb things that should fire off the alerts. And it didn't. At first, I thought I must be looking in the wrong place but I watched profiler and the SSMS gui was calling "EXECUTE msdb.dbo.sp_help_alert" which at its core uses "msdb.dbo.sysalerts". All of that looked right but by golly the last_occurrence_date fields all showed zeros.

I took to twitter asking what else I could do to invoke errors and y'all had some great ideas

  • https://twitter.com/crummel4/status/645764451724095489
  • https://twitter.com/SirFisch/status/645763600116617216
but the one that kept coming up was raiserror . There I scan specify whatever severity I'd like so here's my alert tester for severity 15
DECLARE 
    @DBID int
,   @DBNAME nvarchar(128)
,   @severity int;

SELECT 
    @DBID = DB_ID()
,   @DBNAME = DB_NAME()
,   @severity = 15;

RAISERROR
(
    N'The current database ID is:%d, the database name is: %s.'
,   @severity -- Severity.
,   1 -- State.
,   @DBID -- First substitution argument.
,   @DBNAME
);
I run it, verify SSMS shows The current database ID is:23, the database name is: tempdb. Msg 50000, Level 15, State 1, Line 29 and as expected sp_help_alert shows ... 0 for occurrence for date and time? What the heck? I raised an error of the appropriate severity and had SQL Alerts defined for that severity - it's peanut butter and chocolate, they go together but this was more like mayonnaise and peanut butter.

A quick trip around the interblag raised some interesting possibilities. Maybe I needed to restart SQL Agent. Maybe I was tracking the wrong thing and some other links I had since closed but none of these bore fruit.

Resolution

I scaled up my RAISERROR call to enumerate all the severity levels just to get something to work and it wasn't until I hit 19 that I found my mistake. Error severity levels greater than 18 can only be specified by members of the sysadmin role, using the WITH LOG option.

Once I tacked on the "WITH LOG" option, my alerts began firing.

DECLARE 
    @DBID int
,   @DBNAME nvarchar(128)
,   @severity int;

SELECT 
    @DBID = DB_ID()
,   @DBNAME = DB_NAME();

DECLARE Csr CURSOR
FOR
-- Sev 20 and above breaks the current connection
SELECT TOP 20
    CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS int) AS Severity
FROM
    sys.objects AS O;

OPEN Csr;

FETCH NEXT FROM Csr
INTO @severity;

WHILE(@@FETCH_STATUS = 0)
BEGIN

    RAISERROR
    (
        N'The current database ID is:%d, the database name is: %s.'
    ,   @severity -- Severity.
    ,   1 -- State.
    ,   @DBID -- First substitution argument.
    ,   @DBNAME
    )
    WITH LOG;

    FETCH NEXT FROM Csr
    INTO @severity;
END

CLOSE Csr;
DEALLOCATE Csr;

Resources

Glenn Berry and Brent Ozar Unlimited both have great posts with some Alerts should have turned on for all of your servers.