World of Whatever

A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Thursday, February 23, 2017

Biml Database Inspection

Biml Database Inspection

Importing tables via Biml

I've mentioned how using Biml to reverse engineer a very large database was the only option and there is plenty of great material in the community about how to do this but one thing I kept stumbling over was the using the import methods to build the above Biml always seemed to fail somewhere along the way. I assumed it was just me not understanding how it works. But, today someone else got bit with the same stumbling block so I wanted to talk through the basics of how the modeling works within Biml and subsequent posts will show the source of the issue and a work around.

Preface

Biml allows you to define the tables, views, and constraints in your database. Let's look at a minimal viable table definition for dbo.AWBuildVersion from AdventureWorks2014. Ready?

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Adventureworks" ConnectionString="Data Source=localhost\dev2014;Initial Catalog=AdventureWorks2014;Provider=SQLNCLI11;Integrated Security=SSPI;" />
    </Connections>
    <Databases>
        <Database ConnectionName="Adventureworks" Name="AdventureWorks2014" />
    </Databases>
    <Schemas>
        <Schema Name="dbo" DatabaseName="AdventureWorks2014" />
    </Schemas>
    <Tables>
        <Table Name="AWBuildVersion" SchemaName="AdventureWorks2014.dbo">
            <Columns>
                <Column Name="SystemInformationID" DataType="Byte" IdentityIncrement="1" />
                <Column Name="Database Version" DataType="String" Length="25" />
                <Column Name="VersionDate" DataType="DateTime" />
                <Column Name="ModifiedDate" DataType="DateTime" />
            </Columns>
            <Keys>
                <PrimaryKey Name="PK_AWBuildVersion_SystemInformationID" Clustered="false">
                    <Columns>
                        <Column ColumnName="SystemInformationID" />
                    </Columns>
                </PrimaryKey>
            </Keys>
            <Indexes>
            </Indexes>
        </Table>
    </Tables>
</Biml>

Wow, that's a lot! Let's break it down.

Connections

Our Connections collection has a single entity in it, an OLE DB Connection named Adventureworks (remember, all of this is case sensitive so this Adventureworks is a different beast from AdventureWorks, ADVENTUREWOKRS, etc). This provides enough information to make a database connection. Of note, we have the server and catalog/database name defined in there. Depending on the type of connection used will determine the specific name used i.e. Initial Catalog & Data Source; Server & Database, etc. Look at ConnectionStrings.com if you are really wanting to see how rich (horrible) this becomes.

Databases

A Database (AstDatabaseNode) requires a Name and a ConnectionName. We certainly know the connection since we just defined it in the previous section and so here I'm naming the Database AdventureWorks2014. This just happens to align with the value specified in Initial Catalog but use whatever is natural. Do not name it after the environment though, please. There is nothing worse than talking about an entity named "DevSalesDb" which is referencing the production database but named after the location it was built from.

Schemas

A Schema (AstSchemaNode) requires a Name and a DatabaseName (see above). Since I'm after a table in the dbo schema, I just specify it as the name.

Tables

Finally, the Table (AstTableNode) which requires a Name and a SchemaName. Where have we seen this pattern? However, look at the value of the SchemaName. We have to qualify the schema with the database because we could have two Schema entities in dbo that point to different Database entities.

Once inside the Table entity, we can define our columns, keys, indices as our heart desires.

Wrap up

An amusing side note, if you're using Mist/BimlStudio to import the Schema and Table, the wizard renders all of this correctly, there only seems to be a defect in how I'm scripting the above entities.

Wednesday, November 9, 2016

Playing audio via Biml

Playing audio via Biml

How often do you need to play audio while you're compiling your Biml packages? Never? Really? Huh, just me then. Very well, chalk this blog post as one to show you that you really can do *anything* in Biml that you can do in C#.

When I first learned how I can play audio in .NET, I would hook the Windows Media Player dll and use that. The first thing I then did was create an SSIS package that had a script task which played the A-Team theme song while it ran. That was useless but a fun demo. Fast forward to using Biml and I could not for the life of me get the Windows Media Player to correctly embed in a Biml Script Task. I suspect it's something to do with the COM bindings that Biml doesn't yet support. Does this mean you shouldn't use Biml - Hell no. It just means I've wandered far into a corner case that doesn't yet have support.

Hours before going on the stage for my Summit 2016 presentation, I took another crack at finding a way to play music via .NET and discovered the System.Media.SoundPlayer class and I was ecstatic.

Tada!

You understand this code, it's not hard. I create a string variable to hold the path to my sound file. I picked a sound file in a well known location. I prefaced my string with the @ symbol to avoid having to escape the default windows path separator.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#
string sourceFile = string.Empty;
sourceFile = @"C:\Windows\Media\tada.wav";
System.Media.SoundPlayer player = new System.Media.SoundPlayer(sourceFile);
player.Play();
#>
</Biml>

SSIS package that plays music

Using the above knowledge, we can also have an SSIS package with a script task to play an arbitrary media file

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="SoundPlayer" >
            <Variables>
                <Variable Name="AudioPath" DataType="String">http://www.moviewavs.com/0053148414/WAVS/Movies/Star_Wars/imperial.wav</Variable>
            </Variables>
            <Tasks>
                <Script ProjectCoreName="ST_PlayAudio" Name="SCR Echo Back">
                    <ScriptTaskProjectReference ScriptTaskProjectName="ST_PlayAudio" />
                </Script>            
            </Tasks>
        </Package>
    </Packages>
<ScriptProjects>
    <ScriptTaskProject ProjectCoreName="ST_PlayAudio" Name="ST_PlayAudio" VstaMajorVersion="0">
            <ReadOnlyVariables>
                <!-- List all the variables you are interested in tracking -->
                <Variable Namespace="User" VariableName="AudioPath" DataType="String" />
            </ReadOnlyVariables>
            <Files>
                <File Path="ScriptMain.cs" BuildAction="Compile">using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_PlayAudio
{
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        public void Main()
        {
            string sourceFile = string.Empty;
            sourceFile = Dts.Variables[0].Value.ToString();
            System.Media.SoundPlayer player = new System.Media.SoundPlayer(sourceFile);
            player.Play();
            Dts.TaskResult = (int)ScriptResults.Success;
        }

        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
    }
}                </File>
                <File Path="Properties\AssemblyInfo.cs" BuildAction="Compile">
using System.Reflection;
using System.Runtime.CompilerServices;

[assembly: AssemblyVersion("1.0.*")]
                </File>
            </Files>
            <AssemblyReferences>
                <AssemblyReference AssemblyPath="System" />
                <AssemblyReference AssemblyPath="System.Data" />
                <AssemblyReference AssemblyPath="System.Windows.Forms" />
                <AssemblyReference AssemblyPath="System.Xml" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ManagedDTS.dll" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ScriptTask.dll" />
            </AssemblyReferences>
        </ScriptTaskProject>
    </ScriptProjects>
</Biml>

Now, you could marry the two Biml snippets together so that you get audio playing while you build an SSIS package that plays audio, Dawg.

Friday, October 28, 2016

What packages still use Configuration?

What packages still use Configurations?

I'm sitting in Tim Mitchell's excellent "Deep Dive into the SSISDB session" and someone asked how they can figure out what packages use the classic deployment model's Configuration option.

Create an SSIS package. Add a Variable to your package called FolderSource and assign it the path to your SSIS packages. Add a Script Task to the package and then add @[User::FolderSource] to the ReadOnly parameters.

Double click the script, assuming C#, and when it opens up, use the following script as your Main

public void Main()
{
    // Assign the SSIS Variable's value to our local variable
    string sourceFolder = Dts.Variables["FolderSource"].Value.ToString();
    Microsoft.SqlServer.Dts.Runtime.Application app = new Microsoft.SqlServer.Dts.Runtime.Application();
    string message = "Package {0} uses configuration {1}";
    bool fireAgain = false;
    Package pkg = null;
    foreach (string packagePath in System.IO.Directory.GetFiles(sourceFolder, "*.dtsx", System.IO.SearchOption.AllDirectories))
    {
    try
    {
        pkg = app.LoadPackage(packagePath, null);
        // EnableConfigurations is a boolean specifying whether you have checked the first button
        if (pkg.EnableConfigurations)
        {
        Dts.Events.FireInformation(0, "Configuration Finder", string.Format(message, packagePath, string.Empty), string.Empty, 0, ref fireAgain);

        // This will expose all the configurations that are being used
        // because you could have specified different configuration mechanisms
        foreach (Configuration config in pkg.Configurations)
        {
            Dts.Events.FireInformation(0, "Configuration Details", string.Format(message, packagePath, config.ConfigurationType), string.Empty, 0, ref fireAgain);
        }
        }
    }
    catch (Exception ex)
    {
        Dts.Events.FireWarning(0, "Config finder", packagePath, string.Empty, 0);
        Dts.Events.FireWarning(0, "Config finder", ex.ToString(), string.Empty, 0);

    }
    }

    Dts.TaskResult = (int)ScriptResults.Success;
}

Save and close the package and hit F5.

How cool is that, we're using an SSIS package to inspect the rest of our packages. Now, if you store your packages in the MSDB, the above changes ever so slightly. We'd need to provide a connection string to the database and then change our first foreach loop to enumerate through all the packages in the MSDB. Perhaps that'll be a followup post.

Tuesday, October 18, 2016

Debugging Biml

Debugging Biml

At this point, I don't even know who to give credit for on this tip/trick as I've seen it from so many luminaries in the field. This mostly applies to BimlScript debugging within the context of BIDS Helper/BimlExpress.

Using tooling is always a trade-off between time/frustration and monetary cost. BIDS Helper/BimlExpress are free so you're prioritizing cost over all others. And that's ok, there's no judgement here. I know what it's like to be in places where you can't buy the tools you really need. One of the hard parts about debugging the expanded Biml from BimlScript is you can't see the intermediate or flat Biml. You've got your Metadata, Biml and BimlScript and a lot of imagination to think through how the code is being generated and where it might be going wrong. That's tough. Even at this point where I've been working with it for four years, I can still spend hours trying to track down just where the heck things went wrong. SPOILER ALERT It's the metadata, it's always the metadata (except when it's not). I end up with NULLs where I don't expect it or some goofball put the wrong values in a field. But how can you get to a place where you can see the result? That's what this post is about.

It's a trivial bit of code but it's important. You need to add a single Biml file to your project and whenever you want to see the expanded Biml, prior to it being translated into SSIS packages, right click on the file and you'll get all that Biml dumped to a file. This recipe calls for N steps.

WritAll.biml

Right click on your project and add a Biml file called WriteAll.biml. Or whatever makes sense to you. I like WriteAll because it will generally sort to the bottom of my list of files alphabetically and that's about as often as I hope to use it.

Tiering

The first thing we need to do is ensure that the tier of this BimlScript file is greater than any other asset in the project. We will do that through the directive of template tier="N" where N is a sufficiently large number to ensure we don't have any natural tiers greater than it.

I'll also take this as an opportunity to impart a lesson learned from writing Apple Basic many, many years ago. Do not use one as the step value for line numbers or tiers in this case. Instead, give yourself some breathing room and count by 10s because sure as you're breathing, you'll discover you need to insert something between 2 and 3 and you can't add 2.5, much less 2.25. The same lesson goes with Tiers. Tier 0 is flat biml. Tier is biml script that doesn't specify its tier. After that you're in control of your destiny.

WriteAllText

The .NET library offers a method called WriteAllText. This is the easiest method to write all the text to a file. It takes two arguments: the contents and the file name. If the file exists, it's going to overwrite it. If it doesn't exist, it will create it. Piece of pie!

Path.Combine

WriteAllText needs a path - where should we put it? I'm lazy and want to put our debugging file into a location everyone has on their computer. I can't tell you what that location will be because it's going to be different for everyone but it's guaranteed to exist. It's the %userprofile% location. On my work laptop, it's C:\Users\BillFellows. On my home computer, it's C:\users\bfellows At the governmental agency, my home directory was actually on a network somewhere so it was just H:\ All you have to do is open up windows explorer and type %userprofile% and that's where we'll write this file.

If you are ever putting paths together through string building, please stop. It's a pain to deal with escaping the path separators, \, and it can be difficult to be consistent as some will build a path with a trailing slash and others won't. Stop trying to figure out that logic and use Path.Combine

We'll combine the special path location with a file name, Debug.biml and get a perfectly valid path for our output file. If you don't want overkill, then just make a hardcoded path.

GetBiml

Every object in the Biml universe supports the GetBiml method. What's amazingly powerful about this function is that it has the ability to call the GetBiml method on all the items under it. You don't have to worry about how many packages exist and how many Tasks and Variables and Events exist under them. Just call the appropriate parent level GetBiml method and object inheritance takes care of the rest.

RootNode

The RootNode is the base of everything in Biml so by calling its GetBiml method, you'll get the Biml for all the derived objects within the project. Eureka! That's what we wanted! And since we won't call this until everything else has completed, via tiering property, we will get our flattened Biml

WriteAll.biml

Putting all that together, we get a file that looks like this

<#@ template tier="999"#>
<#
System.IO.File.WriteAllText(System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.UserProfile), "Debug.biml"), RootNode.GetBiml());
#>
If I want to see what's being built in ComplexBimlScript, I simply multiselect it and WriteAllText and I'll get a Debug.biml file. From there, I generally open Debug.biml in a separate SSIS project and Check Biml For Errors and it's much easier to zip to the error. Then it's a matter of trying where that bad code is generated back to the correct bit of Biml.

Closing thoughts

If you get some really weird error going on inside your BimlScript, this debug file will appear to be an empty Biml tag. In that case, it's probably your metadata so start breaking your solution down until it's working and then gradually add complexity back into it.

p.s.

An alternative thought on tearing your code apart until you find it works would be to use this WriteAllText approach but do it per tier. That would allow you to inspect the compilation at ever step in the process to discern where things went wrong.

Thursday, October 6, 2016

UNION removes duplicates

UNION removes duplicates

When you need to combine two sets of data together, we use the UNION operator. That comes in two flavors: UNION and UNION ALL. The default is to remove duplicates between the two sets whereas UNION ALL does no filtering.

Pop quiz! Given the following sets A and B

What's the result of SELECT * FROM A UNION SELECT * FROM B;

Piece of cake, we start with everything in A and get the values in B that aren't in A.


So we're looking at 1, 5, 9 7, 3, 3, 2, 3

Except of course that's not what is actually happening. UNION is actually going to smash both sets of data together and then take the distinct results. Or it does a distinct within each result set, smashes them together and takes one last pass to remove duplicates. I don't know or care about the actual mechanics, what I care about is the final outcome.

We actually end up with a result of 1, 5, 9, 7, 3, 2. In the fifteen years I've been writing SQL statements, I don't think I ever realized that behavior of the final result set being distinct. I thought it was purely an intra set dedupe process.

I thought wrong

Tuesday, October 4, 2016

Generating a workload for AdventureWorks

Just a quick note, if you need to generate some database activity, Jonathan Keyhayias has the The AdventureWorks2008R2 Books Online Random Workload Generator that works "fine enough" for AdventureWorks2014.

Run AdventureWorks BOL Workload.ps1

I modified line 5 to point to my server and line 21 to point to AdventureWorks2014.

AdventureWorks BOL Workload.sql

Despite the PowerShell script setting the database context, the accompanying .sql file has explicit USE statements but a quick search and replace for AdventureWorks2008R2 -> AdventureWorks2014 had me up and running.

Thank you to Jonathan for the handy script. Now if you'll excuse me, I have query activity to capture.

Thursday, September 29, 2016

Resolving the Biml project level connection manager issue

Biml connection manager ids broken

Well, it's not quite that dire but it sure can seem like it. You build out N packages that use project level connection managers and all is well and good until you open them up and they're all angry with red Xs in them. Or as this person encountered, the OLE DB Source defaulted to the first connection manager and table it found.

Root cause

I say root cause without looking at source code because the guys at Varigence are way smarter than I can hope to be. I can however look at cause and effect and mitigate as I can. What I see as happening is that the packages that get generated in round 1 have their connection manager ids (ugly guids inside the XML) set and those match the Project level Connection Manager and all is good. You then generate more packages and depending on whether you overwrite the existing Connection Managers, will determine whether you break the existing packages or the new ones. Sophie's choice, eh?

The good thing, is that there's a fairly simple approach to solving this issue. For your project level connection managers, assign an explicit GUID and then reference that same guid in your packages. This approach will require tiering but it at least "works on my computer".

Environments.biml

Since we'll provide no explicit tier and there no script in there, this will be tier 0. If you need to dynamically define the values in your environment file, just be sure it is a lower tier than the subsequent files. In this file, we simply enumerate our Connections. You'll see I have two connections: one project level, one not.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <AdoNetConnection 
            Name="CM_ADO_DB" 
            ConnectionString="Data Source=localhost\dev2014;Integrated Security=SSPI;Connect Timeout=30;Database=msdb;" 
            Provider="SQL"  />
        <OleDbConnection 
            Name="CM_OLE" 
            ConnectionString="Data Source=localhost\dev2014;Initial Catalog=tempdb;Provider=SQLNCLI11.0;Integrated Security=SSPI;" 
            CreateInProject="true" />
    </Connections>
</Biml>

Projects.biml

There is no attribute in the Connections collection to assign a guid. It's simply not there. If you want to associate an Id with an instance of a Connection your choices are the Project node and the Package node. Since we're dealing with project level connection managers, we best cover both bases to ensure Ids synchronize across our project. If you wish, you could have embedded this Projects node in with the Connections but then you'd have to statically set these Ids. I feel like showing off so we'll go dynamic.

To start, I define a list of static GUID values in the beginning of my file. Realistically, we have these values in a table and we didn't go with "known" values. The important thing is that we will always map a guid to a named connection manager. If you change a connection manager's definition from being project level to non, or vice versa, this will result in the IDs shifting and you'll see the same symptoms as above.

I use Biml to inspect itself, thus the need for tiering, and for all the Connection managers I find that satisfy the criteria of CreateInProject == true, I want to define them within my Connections collection inside my PackageProject node.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <#
    // Build a repeatable list of GUIDs
    List<Guid> staticGuids = new List<Guid>();
    staticGuids.Add(Guid.Parse("DEADBEEF-DEAD-BEEF-DEAD-BEEFDEADBEEF"));
    staticGuids.Add(Guid.Parse("DEADBEEF-DEAD-BEEF-DEAD-BEEF8BADF00D"));
    staticGuids.Add(Guid.Parse("DEADBEEF-DEAD-BEEF-DEAD-D15EA5EBAD00"));
    staticGuids.Add(Guid.Parse("FEEDFACE-DEAD-BEEF-FACE-FEED00000000"));
    #>
    <Projects>
        <PackageProject Name="ConnectionManagerIssue">
            <Connections>
                <# 
                    Guid g = new Guid();
                    // Only generate a Connection node for project level connections
                    foreach (var item in RootNode.Connections.Where(x => x.CreateInProject))  
                    {
                        // Pop the first element so we don't repeat
                        g = staticGuids[0];
                        staticGuids.RemoveAt(0);
                #>
                    <Connection ConnectionName="<#=item.Name#>" Id="<#= g.ToString().ToUpper() #>" />
                <#
                }
                #>
            </Connections>
        </PackageProject>
    </Projects>
</Biml>

Packages.biml

We'll use much the same trick except that now we'll inspect the Projects node to find all the connection managers. By default, we'll only have the Project level ones defined there so it's a bit easier. Build the package as normal but at the bottom, stub in the Connections collection and then populate the connection managers with their Ids. Since I'm lazy, I'm going to just call GetBiml for all the connection managers I find in the Projects node collection since they have the same attributes of Name and Id.

<#@ template tier="20" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <# foreach(int index in Enumerable.Range(0,5)) {#>
        <Package Name="Fixed_0<#=index#>">
            <Variables>
                <Variable DataType="String" Name="QuerySource">SELECT 100 AS foo;</Variable>
                <Variable DataType="Int32" Name="RowCountSource">0</Variable>
            </Variables>
            <Tasks>
                <Dataflow Name="DFT Demo">
                    <Transformations>
                        <OleDbSource ConnectionName="CM_OLE" Name="DFT Source">
                            <VariableInput VariableName="User.QuerySource"></VariableInput>
                        </OleDbSource>
                        <RowCount VariableName="User.RowCountSource" Name="RC Source" />
                    </Transformations>
                </Dataflow>
            </Tasks>
            <Connections>
<#
    foreach(var x in RootNode.Projects.SelectMany(x => x.AllDefinedSuccessors()).OfType<AstConnectionReferenceBaseNode>())
    {
        WriteLine(x.GetBiml());
    }
#>
            </Connections>
        </Package>
        <# } #>
    </Packages>
</Biml>

Wrap up

I tack the Connection logic at the bottom of my packages because when Varigence gets the next release of BimlExpress out, I expect this will be resolved so I can just snip that unwanted code out.

Happy Biml'ing