A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Tuesday, October 9, 2018

Biml Excel Data Source without Excel

Biml Excel Meta Data Source without Excel

In the previous post, Reading Excel files without Excel, I showed some simple code to consume Excel without having Excel installed on your machine. How/Why would I use this - well look at the sample spreadsheet. That could be used quite nicely by a business analyst to generate SSIS packages. In fact, it is being used by a very savvy business analyst at one of my clients' shadow IT groups to identify the source data they'd like brought into their data mart. They are translating their mainframe data extracts into SQL equivalents and specifying where the data should land.

This is exciting for me as this team gets their data and knows the business problems they need to solve &emdash; they just didn't have all the tools to do so. They are supplying the data domain expertise and we are generating consistent packages that adhere to corporate standards (as well as defining the scheduling, alerting, etc). It's a good match.

My resources are quite simple: Excel Spreadsheet containing meta data, a driver program and a package template.

The template is your standard truncate and reload pattern with the target table being specified by a parameter. The client validates data by running processes in parallel so the existing mainframe process delivers data to the Billing table while ours delivers to a Billing_NEW table. Once they accept the new process, the target table becomes Billing and the NEW table is dropped. I decided the most native SSIS route would be use specify the target table in as a parameter. We originally have a boolean parameter indicating whether we were loading the new table or the production one but that was more logic and overhead that just specifying which table to load. I force their queries to be dirty reads as some of these queries can be rather messy.

<#@ template designerbimlpath="/Biml/Packages" #>
<#@ property name="schemaName" type="string" #>
<#@ property name="tableName" type="string" #>
<#@ property name="parameterName" type="string" #>
<#@ property name="sourceQuery" type="string" #>
<#@ property name="sourceConnectionName" type="string" #>
<#@ property name="targetConnectionName" type="string" #>
<#@ property name="businessFriendlyName" type="string" #>
<#
string packageName = string.Format("{0}_Load_{1}{2}", targetConnectionName.ToUpper(), businessFriendlyName, "");
CustomOutput.PackageName = packageName;
#>
        <Package Name="&lt;#= packageName #>" ConstraintMode="Linear">
            <Parameters>
                <Parameter Name="TargetTableName" DataType="String"><#= tableName #></Parameter>
            </Parameters>
            <Variables>
                <Variable Name="SchemaName" DataType="String"><#= schemaName#></Variable>
                <Variable Name="TableName" DataType="String" EvaluateAsExpression="true"><#= parameterName #></Variable>
                <Variable Name="QualifiedTableName" DataType="String" EvaluateAsExpression="true">&quot;[&quot; +   @[User::SchemaName] + &quot;].[&quot; + @[User::TableName]+ &quot;]&quot;</Variable>
                <Variable Name="QueryTruncate" DataType="String" EvaluateAsExpression="true">"TRUNCATE TABLE " + @[User::QualifiedTableName] + ";"</Variable>
            </Variables>
            <Tasks>
                  <ExecuteSQL Name="SQL Truncate Target" ConnectionName="&lt;#= targetConnectionName #>">
                    <VariableInput VariableName="User.QueryTruncate" />
                </ExecuteSQL>
                <Dataflow Name="DFT Load &lt;#= businessFriendlyName #>">
                    <Transformations>
                        <OleDbSource ConnectionName="&lt;#= sourceConnectionName #>" Name="OLESRC Query ">
                            <DirectInput><![CDATA[SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
<#= sourceQuery#>
]]>
                            </DirectInput>
                        </OleDbSource>
                        <OleDbDestination Name="OLEDST &lt;#= schemaName #>_<#= tableName#>" ConnectionName="<#= targetConnectionName #>">
                            <TableFromVariableOutput VariableName="User.QualifiedTableName" />
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>

My ProjectDriver.biml file is fairly straight forward. In line 1 I provide a relative path to my EPPlus.dll The ..\ indicates I would find the assembly one folder up - two folders actually since I have a single copy in my base Visual Studio folder. Line 2 specifies we need to bring in OfficeOpenXml library. In Line 5 I create a variable that will hold the metadata for my solution. Line 6 is kind of interesting. I let the template determine what the package name should be based on the supplied meta data. Rather than having to perform that logic twice, it'd be nice to keep track of what packages have been created. Not only nice, it'll be required since we're using the Project Deployment Model! Line 19 is where we actually stamp out a specific package and look at that second parameter out customOutput That is the mechanism for our template to send information back to the caller. In our case, we'll add the package name to our ever growing list of packages. In line 28, we then run back through our list of packages and build out the project's definition. And that's about it. We've already talked about the GetExcelDriverData method. The GetDriverData method provides a simple abstraction between where I actually get metadata and how the packages are built. You can see a commented out reference to a GetStaticDriverData method which I used during development to test boundary conditions. Who knows, maybe I will pull from Azure Tables next...

<#@ assembly name= "..\..\EPPlus.dll" #>
<#@ import namespace="OfficeOpenXml" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#
    Dictionary<string, List&lt;string>> dasData = new Dictionary<string, List&lt;string>>();
    List<string> packageList = new List<string>();
    string templateName = "inc_TruncAndReloadPackageParameter.biml";
    string projectName = "SHADOW_IT_DataSnapshot";

    // Get our meta data    
    dasData = GetDriverData();
#>
    <Packages>
<#

    dynamic customOutput;
    foreach(var key in dasData.Keys)
    {
        WriteLine(CallBimlScriptWithOutput(templateName, out customOutput, dasData[key][0], dasData[key][1], dasData[key][2], dasData[key][3], dasData[key][4], dasData[key][5], dasData[key][6]));
        packageList.Add(customOutput.PackageName);
    }
#>
    </Packages>
    <Projects>
        <PackageProject Name="&lt;#= projectName #>">
            <Packages>
<#
        foreach(var key in packageList)
        {
#>
            <Package PackageName="&lt;#= key #>" />
<#
        }
#>            
            </Packages>
            <Connections>
                <Connection ConnectionName="WWI_DB" />
                <Connection ConnectionName="WWI_DW" />
            </Connections>
        </PackageProject>
    </Projects>
    <Connections>
        <OleDbConnection Name="WWI_DB" ConnectionString="Data Source=.\DEV2017;Initial Catalog=WWI_DW;Provider=SQLNCLI11.1;Integrated Security=SSPI;Application Intent=READONLY;ConnectionTimeout=0;" CreateInProject="true" />
        <OleDbConnection Name="WWI_DW" ConnectionString="Data Source=.\DEV2017;Initial Catalog=WWI_DB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Application Intent=READONLY;" CreateInProject="true" />
    </Connections>
</Biml>

<#+

    /// Get data from Excel worksheet
    public Dictionary<string, List<string>> GetExcelDriverData(string sourceFile)
    {
        Dictionary<string, List<string>> d = new Dictionary<string, List<string>>();
        System.IO.FileInfo fi = new System.IO.FileInfo(sourceFile);
        using (ExcelPackage ep = new ExcelPackage(fi))
        {
            ExcelWorkbook wb = ep.Workbook;
            ExcelWorksheet ws = wb.Worksheets.First();
            if (ws != null)
            {
                // 1 based array to 7, inclusive
                for (int i = ws.Dimension.Start.Row+1; i < ws.Dimension.End.Row+1; i++)
                {
                    List<string> row = new List<string>() { ws.Cells[i, 1].Value.ToString()
                    ,   ws.Cells[i, 2].Value.ToString()
                    ,   ws.Cells[i, 3].Value.ToString()
                    ,   ws.Cells[i, 4].Value.ToString()
                    ,   ws.Cells[i, 5].Value.ToString()
                    ,   ws.Cells[i, 6].Value.ToString()
                    ,   ws.Cells[i, 7].Value.ToString()
                    };
                    
                    d[ws.Cells[i, 7].Value.ToString()] = row;
                }
            }
        }
        
        return d;
    }

    public Dictionary<string, List<string>> GetDriverData()
    {
        string sourceFile= @"C:\Users\billinkc\Documents\ShadowIt_DataSnap.xlsx";
        return GetExcelDriverData(sourceFile);
        //return GetStaticDriverData();
    }
#>

And that's how we can use EPPlus to consume metadata stored in Excel to generate many packages with Biml. Let me know if this helps or if you have questions about how to get this running. It's good stuff, I can't get enough of it.

No comments: