World of Whatever

A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Loading...

Monday, June 29, 2015

Biml - Unpivot transformation

Biml - Unpivot transformation

I had cause today to use the Unpivot transformation in SSIS. My source database was still in 2000 compatibility mode, don't laugh, so I couldn't use the PIVOT operator and I was too lazy to remember the CASE approach. My client records whether a customer uses a particular type of conveyance. For simplicity sake, we'll limit this to just whether they own a car or truck. Some customers might own both, only one or none. Part of my project is to normalize this data into a more sane data structure.

Source data

The following table approximates the data but there are many more bit fields to be had.
CustomerNameOwnsCarOwnsTruck
Customer 111
Customer 210
Customer 310
Customer 410
Customer 510
Customer 610
Customer 711
Customer 810
Customer 910
Customer 1010
Customer 1110
Customer 1210
Customer 1301
Customer 1400
Customer 1500
Customer 1600
Customer 1700
Customer 1800

SSIS Package

The package is rather simple - we have the above source data fed into an Unpivot component and then we have a Derived Column serving as an anchor point for a data viewer.

Unpivot

To no great surprise to anyone who's worked with Biml, the code is not complex. We need to provide specifics about how the pivot key column should work and then the detailed mapping of what we want to do with our columns. Here we're going to keep our CustomerName column but we want to merge OwnsCar and OwnsTruck columns into a single new column called SourceValue. The PivotKeyValue we supply will be the values associated to our pivot. Since we specified an Ansi string of length 20, the values we supply of Car and Truck must map into that domain.

Unpivot Biml

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection ConnectionString="Provider=SQLOLEDB;Data Source=localhost\dev2014;Integrated Security=SSPI;Initial Catalog=tempdb" Name="CM_OLE" />
    </Connections>
        <Packages>
        <Package ConstraintMode="Linear" Name="Component_Unpivot">
            <Variables>
                <Variable Name="QuerySource" DataType="String">
                    <![CDATA[SELECT
    'Customer ' + CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS varchar(2)) AS CustomerName
,   *
FROM
    (
        VALUES
            (CAST(1 AS bit))
        ,   (CAST(1 AS bit))
        ,   (CAST(0 AS bit))
    ) S(OwnsCar)
    CROSS APPLY
    (
        VALUES
            (CAST(1 AS bit))
        ,   (CAST(0 AS bit))
        ,   (CAST(0 AS bit))
        ,   (CAST(0 AS bit))
        ,   (CAST(0 AS bit))
        ,   (CAST(0 AS bit))
    ) F(OwnsTruck);
]]></Variable>
            </Variables>
            <Tasks>
                <Dataflow Name="DFT Unpviot">
                    <Transformations>
                        <OleDbSource 
                            ConnectionName="CM_OLE" 
                            Name="OLESRC Unpivot Source">
                            <VariableInput VariableName="User.QuerySource" />
                        </OleDbSource>

                        <Unpivot
                            Name="UPV Vehicle types"
                            PivotKeyValueColumnName="Vehicle"
                            PivotKeyValueColumnDataType="AnsiString"
                            PivotKeyValueColumnCodePage="1252"
                            PivotKeyValueColumnLength="20"
                            AutoPassThrough="false"
                            >
                            <Columns>
                                <Column SourceColumn="CustomerName" IsUsed="true" />
                                <Column SourceColumn="OwnsCar" TargetColumn="SourceValue" PivotKeyValue="Car" />
                                <Column SourceColumn="OwnsTruck" TargetColumn="SourceValue" PivotKeyValue="Truck" />
                            </Columns>
                        </Unpivot>

                        <DerivedColumns Name="DER Placeholder" />
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
</Biml>

Monday, May 4, 2015

Biml breaking change - ServerExecutionID is repeated within scope

This is a brief post but I wanted to note that with the 1.7 release of BIDS Helper and 4.0 release of Mist, Biml code you found from around the Internet that dealt with the Project Deployment Model may no longer build. Specifically, you might run into

The name ServerExecutionID is repeated within scope. Eliminate duplicate names in scope

In your Variables collection, your Biml likely has an entry like

<Variable Name="ServerExecutionID" DataType="Int64" IncludeInDebugDump="Include" Namespace="System">0</Variable>

The approach of explicitly declaring the ServerExecutionID was required in BIDS Helper 1.6.6 and prior to be able to access that System scoped Variable. With the newer release and the ability to natively emit 2014+ packages, this is no longer a requirement. Not only is it not a requirement, it breaks validation with the above error message.

The resolution is simple, delete that declaration.

Saturday, May 2, 2015

Biml - Adding external assemblies

This blog post is coming live to you from SQL Saturday Dallas BI Edition. We were discussing horrible things you can do in SSIS and I mentioned how you can do anything that the .NET library supports. That got me wondering about whether I could do the same awful things within Biml. The answer is yes, assuming you supply sufficient force.

As a simple demonstration, I wanted to pop up a message box whenever I generated or checked my biml for errors. However, the MessageBox method lives in an assembly that isn't loaded by default. And to get that assembly to import, I needed to add it as a reference for the biml compiler. The last two lines of this show me adding the assembly and then importing the namespace.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <#
    MessageBox.Show("This works");
    #>
</Biml>
<#@ import namespace="System.Windows.Forms" #>
<#@ assembly name= "C:\Windows\Microsoft.NET\Framework\v4.0.30319\System.Windows.Forms.dll" #>

Thursday, April 30, 2015

BIDS Helper 1.7.0 subtle improvement

BIDS Helper 1.7.0.0

On March 17th, a new version of BIDS Helper was released and one of the compelling features in it was the ability to emit 2014 packages natively via Biml. Tucked away in there though, is a usability feature that rocks. The validation reporter no longer pops up in the terrible window it had.

If I Check Biml for Errors, I get the usual pop up window although it's cleaner looking than the 1.6 and earlier versions but the real awesome sauce is in the Output window (View, Output Ctrl+Alt+O)

Biml Validation Items

How awesome is this? These errors, I can use a search engine on them! The picture, not so much. Here's some sample output.

  • Validating BIML
  • 1/1 Emitting Project ErrorCodeTest.dtproj.
  • 1/1 Emitting Package ErrorCodeTest.
  • C:\sandbox\POC_2013\POC_2013\BimlScript.biml(13,26) : Error 0 : Component OLE_SRC dbo_ErrorCodes of type AstOleDbSourceNode in Dataflow DFT Test has duplicate column reference with name 'ErrorCode' on ouput 'Error'. This may be due to the 'ErrorCode' and 'ErrorColumn' columns that are automatically added to error output paths by SSIS. Ensure that all column names are unique.
  • C:\sandbox\POC_2013\POC_2013\BimlScript.biml(13,26) : Error 0 : Component OLE_SRC dbo_ErrorCodes of type AstOleDbSourceNode in Dataflow DFT Test has duplicate column reference with name 'ErrorColumn' on ouput 'Error'. This may be due to the 'ErrorCode' and 'ErrorColumn' columns that are automatically added to error output paths by SSIS. Ensure that all column names are unique.
  • EmitSsis. There were errors during compilation. See compiler output for more information.

Thank you and excellent work on this release to Varigence and the devs on the BIDS Helper project.

Tuesday, April 21, 2015

Biml - separate dataflows into separate packages

I have an upcoming client with overly complicated SSIS packages. One package alone has 80 dataflow tasks, very few of which are related to each other. You wouldn't believe, but they have maintenance issues with this package! Especially with regard to concurrent editing of an SSIS package which doesn't exist. To simplify their solution, part of my proposal is that we take the massive, monolithic SSIS package and break it out into many teeny-tiny SSIS packages that just focus on a single task. This will allow for one developer to change the sales order detail dataflow task while another can safely edit the sales employee data flow task as they'll be contained in completely separate packages. I'll then build a master package that ensures proper invocation of the subpackages.

To get started, I'm going to use Mist to reverse engineer my packages into Biml. Let's assume it generates the following package.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="SourcePackage">
            <Tasks>
                <Dataflow Name="DFT 1" />
                <Dataflow Name="DFT 2" />
                <ExecuteProcess Name="EPT Do not include" Executable="C:\Windows\System32\cmd.exe" />
                <Container Name="SEQ A">
                    <Tasks>
                        <Dataflow Name="DFT A1" />
                    </Tasks>
                </Container>
            </Tasks>
        </Package>
        <Package Name="SourcePackage2">
            <Tasks>
                <Dataflow Name="DFT 1" />
                <Dataflow Name="DFT 2" />
                <ExecuteProcess Name="EPT Do not include" Executable="C:\Windows\System32\cmd.exe" />
                <Container Name="SEQ A">
                    <Tasks>
                        <Dataflow Name="DFT A1" />
                    </Tasks>
                </Container>
            </Tasks>
        </Package>
    </Packages>
</Biml>
I only want the data flow tasks extracted from this. You can also see that I have data flow tasks encapsulated into containers. That's a pain to enumerate through in the .NET object model but Biml gives us lovely LINQ shortcuts. The first thing I want to do is find all of dataflow tasks. RootNode.AllDefinedSuccessors().OfType() generates an enumerable list.

For each of the items in our list, we want to do something. Here I'm creating a new package with the name of the original package + the data flow task. I assume this will be unique but it's not guaranteed. Within my Tasks collection, I simply call the GetBiml method which preserves the definition of my data flow.

<#@ template language="C#" tier="2" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
    <# 
    foreach(var t in RootNode.AllDefinedSuccessors().OfType<AstDataflowTaskNode>())
    {
    #>
        <Package Name="<#=t.FirstThisOrParent<AstPackageNode>() #> - <#=t.Name #>">
            <Tasks>
                <#=t.GetBiml() #>
            </Tasks>
        </Package>
    <#
    }
    #>        
    </Packages>
</Biml>
If you're in BIDS Helper, simply highlight both files and click generate SSIS packages. Out comes 8 SSIS packages, the 2 originals and then one for each data flow.

Think about the possibilities this opens up for you! Maybe I'd also like to add in an auditing framework (I would). Maybe I want to then turn on logging and configuration since this client is on 2008. That's just as easy. Oh Biml, I <3 you.

Wednesday, April 1, 2015

Never trust the SSMS GUI

There's a group of SQL Server professionals that try to help on StackOverflow and DBA.StackExchange.com. One of the most telling symptoms of a bad question is people using the GUI to administer their SQL Server instances. "I clicked here, here, here and bad thing happened". What did you actually do? Do you know?

It's ok, you don't have to be ultra hardcore and memorize every bit of syntax or spend time looking through Books Online to find the specific command you're looking. You can use the GUI to administer your machine. Just don't click the OK button. Instead, look up.

That beautiful Script button is the one you're looking for. That generates all the DDL that clicking OK would have done but now you have something you can inspect. "Trust, but verify" as someone once said. Not only can you inspect it, you can share those commands with someone else.

In today's example, I learned the GUI was silently swallowing the error. I had created an operator for myself but misspelled my last name. I correct my name, clicked OK and ... it never changed. Weird, I must have clicked Cancel. This time, I ensured I clicked that big OK button.

I hit refresh on the Operators list and what the heck? Why am I still "Bill Fellow?" I demand an "S"! Once more unto the GUI and this time, I clicked Script Action to new query window.

EXEC msdb.dbo.sp_update_operator @name=N'Bill Fellows', 
        @enabled=1, 
        @pager_days=0, 
        @email_address=N'BillinKC@world.domination', 
        @pager_address=N'', 
        @netsend_address=N''
Perfect, this is what I want. I'm updating the operator...
Msg 14262, Level 16, State 1, Procedure sp_update_operator, Line 61
The specified @name ('Bill Fellows') does not exist.
Or not. I could have been spent time yelling at the SQL Server for being "dumb" or posted some question with exact reproduction and if someone was inclined, they could have verified that yes, it doesn't appear to rename. But look what happens when you actually run the command yourself. It tells you exactly why it's not working.

Asking questions is good. We should all endeavor to ask questions of everything. But even better is go find the answer.

Happy scripting.

Thursday, March 26, 2015

Biml SSIS ErrorCode and ErrorColumn

Did you know that ErrorCode and ErrorColumn are "reserved" column names in an SSIS Data Flow? I've been doing this for ten years now and I had never run into this until this week. My client's application has an ErrorCodes table defined like this
USE [tempdb]
GO
CREATE TABLE [dbo].[ErrorCodes]
(
    [ErrorCodeId] [uniqueidentifier] NOT NULL
,   [ErrorCode] [nvarchar](3) NOT NULL
,   [ErrorText] [nvarchar](50) NOT NULL
,   [CreateDate] [datetime] NOT NULL
,   [CreatedBy] [uniqueidentifier] NULL
,   [ModifiedDate] [datetime] NULL
,   [ModifiedBy] [uniqueidentifier] NULL
,   [ModuleReference] [uniqueidentifier] NULL
,   [RowStatusId] [uniqueidentifier] NULL
,   CONSTRAINT [PK_dboErrorCode] PRIMARY KEY CLUSTERED ([ErrorCodeId] ASC)
);
We needed to replicate the data out of the application tables so naturally, I had described the pattern in Biml and let it run. Until it blew up on me. The 1.7.0 release of BIDS Helper uses the new and improved Biml engine and it actually reported an error on emission. Contrast that with the 1.6.0 release which happily emitted the DTSX. This of course is documented in the Release Notes for Mist 4.0 Update 1

BimlScript Errors/Warnings

  • Added an error for duplicate dataflow column nodes. This usually arises with "reserved" column names "ErrorCode" and "ErrorText" (sic)

That should actually be "ErrorColumn". There's an ErrorDescription that is added in the OnError event and I've sent an email along to Varigence to get that corrected but I'm not seeing an ErrorText anywhere.

Reproduction

The following Biml is sufficient for you to see the error generated (and to show off the new error reporting functionality in BIDS Helper 1.7.0)
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="CM_OLE" ConnectionString="Data Source=localhost\dev2014;Initial Catalog=tempdb;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;" />
    </Connections>
    <Packages>
        <Package ConstraintMode="Linear" Name="ErrorCodeTest">
            <Variables>
                <Variable DataType="String" Name="SimulateBadTable"><![CDATA[SELECT * FROM (VALUES (NULL, NULL)) D(ErrorCode, ErrorColumn); ]]></Variable>
            </Variables>
            <Tasks>
                <Dataflow Name="DFT Test">
                    <Transformations>
                        <OleDbSource ConnectionName="CM_OLE" Name="OLE_SRC dbo_ErrorCodes">
                            <VariableInput VariableName="User.SimulateBadTable"></VariableInput>
                        </OleDbSource>
                        <DerivedColumns Name="Do Nothing"></DerivedColumns>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
</Biml>
Component OLE_SRC dbo_ErrorCodes of type AstOleDbSourceNode in Dataflow DFT Test has duplicate column reference with name 'ErrorColumn' on ouput 'Error'. This may be due to the 'ErrorCode' and 'ErrorColumn' columns that are automatically added to error output paths by SSIS. Ensure that all column names are unique.

Known issue

This is apparently a known issue based on this Connect Item SSIS - ErrorCode column in source table causes duplicate ErrorCode to be introduced but it's marked as Closed - Fixed. I'm not seeing the fixed part with SQL Server 2014

For those Bimling, it's also logged over on the Varigence forums but they simply reference the Connect item.

What to do

I don't know yet (sadly, this excites me). My lazy pattern was using the table selector for my OLE DB Source but for this one table, I'd need to explicitly grab the column list and alias the ErrorCode column as something else. What I'd rather do is rename the Error path's columns but that does not seem possible.