A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Thursday, November 7, 2013

SSIS Loading Windows Azure Blob Storage

On StackOverflow today, someone asked if you could upload an image from an on-premises SQL Server to Windows Azure Blob Storage with SSIS. That's just the sort of crazy stuff I need to see if I can do so off I went.

I must give great credit to the Azure team and in particular, their tech writers. I was able to crib heavily from their how to and had a working demo in fairly short order. Tying it into SSIS was a bit more work just due to the eccentricities of the platform but hey, it works!

I'm reposting my answer here in case I need to remember how to do this.


What a fun question this was! I got to thread together a lot of pieces that I had never tried.

I first built out a simple console app based on the fine manual over on HOW TO: Blob Storage. Knowing that I had working code allowed me to adapt it for SSIS.

I created 3 SSIS Variables at the Package level. AccountName, AccountKey and ContainerName. They are all data type String. These provide credentials + the folder where my uploaded data will reside.

data flow and variables

Data Flow

The general look of your data flow is rather simple. A data source to a Script Component that will act as a Destination. You will need two columns: one provides a unique name for the blob and the other will be the binary bits.

My source is a trivial table. It has Country Names and their flag (stored as varbinary(max)) which you yourself can scrape from the CIA World Handbook if you're so inclined.

The Destination will be a bit of C#. Add a Script Component of type Destination.

On the Script tab, I have 3 ReadOnly Variables listed User::AccountKey,User::AccountName,User::ContainerName

On the Input Columns tab, I select CountryName and FlagImage.

The script itself follows. As noted in the How To, you will need to add a reference to Microsoft.WindowsAzure.Storage assembly before you can access the last 3 assemblies there.

using System;
using System.Data;
using System.IO;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

// Must add reference to Microsoft.WindowsAzure.Storage for this to work
// http://www.windowsazure.com/en-us/develop/net/how-to-guides/blob-storage/
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Blob;

/// <summary>
/// Watch me load data to Azure from SSIS
/// </summary>
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

    /// <summary>
    /// The storage account used
    /// </summary>
    private CloudStorageAccount storageAccount;

    /// <summary>
    /// An entity to work with the Blobs
    /// </summary>
    private CloudBlobClient blobClient;

    /// <summary>
    /// Blobs live in containers
    /// </summary>
    private CloudBlobContainer container;

    /// <summary>
    /// blockBlob instead of a pageBlob
    /// </summary>
    private CloudBlockBlob blockBlob;

    /// <summary>
    /// This method is called once, before rows begin to be processed in the data flow.
    ///
    /// You can remove this method if you don't need to do anything here.
    /// </summary>
    public override void PreExecute()
    {
        base.PreExecute();
        string cs = string.Empty;
        string csTemplate = string.Empty;
        string accountName = string.Empty;
        string accountKey = string.Empty;
        string containerName = string.Empty;

        accountName = Variables.AccountName;
        accountKey = Variables.AccountKey;
        containerName = Variables.ContainerName;
        csTemplate = "DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}";
        cs = string.Format(csTemplate, accountName, accountKey);

        this.storageAccount = CloudStorageAccount.Parse(cs);
        this.blobClient = this.storageAccount.CreateCloudBlobClient();
        this.container = this.blobClient.GetContainerReference(containerName);
        this.container.CreateIfNotExists();
        this.container.SetPermissions(new BlobContainerPermissions { PublicAccess = BlobContainerPublicAccessType.Blob });

    }

    /// <summary>
    /// For each row passing through, upload to Azure
    /// </summary>
    /// <param name="Row">The row that is currently passing through the component</param>
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        string blobName = string.Empty;

        using (MemoryStream memStream = new MemoryStream(Row.FlagImage.GetBlobData(0, (int)Row.FlagImage.Length)))
        {
            this.blockBlob = this.container.GetBlockBlobReference(Row.CountryName);
            this.blockBlob.UploadFromStream(memStream);
        }
    }

}

Global Assembly Cache (GAC)

Assemblies you wish to use within SSIS must reside in the GAC. Assemblies cannot go into the GAC unless they are signed. Fortunately, the Azure assemblies are signed so from a Visual Studio Command Prompt, type gacutil -if "C:\Program Files\Microsoft SDKs\Windows Azure\.NET SDK\v2.1\ref\Microsoft.WindowsAzure.Storage.dll" or the equivalent of where your version of that assembly exists

Load successful

And as proof, here's a shot from Azure Storage Explorer

blobs everywhere

No comments: