A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Wednesday, November 2, 2011

Using Excel in an SSIS lookup transformation

There was a question on StackOverflow: Excel Source as Lookup Transformation Connection asking how one can use Excel as a source in a lookup. As I worked on the answer, I figured it'd be a good thing to post here so I'm copying and pasting my work. The important thing is that this approach isn't restricted to just the lookup task. If you are having trouble using Excel in SSIS as a normal data source because it thinks the data types have changed, access it via the OLE DB driver.

As you have correctly discerned, you cannot use the Excel Connection Manager in a Lookup task, it only accepts the OLE DB connection manager. The "normal" Excel connection isn't an OLE DB type. The trick then, is to use an OLE DB connection manager with an Excel spreadsheet.

For this example, I have a spreadsheet with state codes and their full name and my source data only has the abbreviations flowing through. This walk through will wire up a lookup task against Excel to retrieve those values.

Keep these caveats in mind: Lookups are case sensitive, regardless of whether the source database (or file in this case) is case sensitive. Excel strings are always going to be interpreted as unicode/nvarchar.

Source Data

Given a simple file sitting at C:\tmp\LookupReference.xlsx that looks like the following
Simple key value pairs in Excel

Connection Manager

We must first establish an OLE DB Connection Manager. Instead of the default "Native OLE DB\SQL Server Native Client 10.0" change that to "Native OLE DB\Microsoft Office 12.0 Access Database Engine OLE DB Provider". In the Server or file name, locate the source file. Clicking Test Connection at this point will result in an error.

OLE DB Connection Manager Connection Tab

Here comes the "magic." Click the All tab and scroll to the top. In Extended Properties, add the indicated string "Excel 12.0;HDR=YES; IMEX=1;" This tells the provider that we are going to use Excel 12.0 (.xlsx format) with a header row and the IMEX 1 tells the driver there will be intermixed data.

OLE DB Connection Manager All Tab

Your package sould now look something like this. A connection manager with extended properties set and assumes a preexisting data flow

Control Flow

Data Flow

To simplify matters, I have a script source that generates 3 rows of data with state codes MO, KS and NE and sends them down the pipeline. Your source will obviously be different but the concept will remain the same. Sample code provided in the annotation.

In your lookup transformation, you will need to write a query against the spreadsheet. It's similar to a normal database query except your table is going to be Sheet1$ unless you have a named range in which your table would be MyRange Note the $ is required when referencing a sheet. Using the sample spreadsheet above, my query would be

SELECT
    S.StateCode
,   S.StateName
FROM 
    `Sheet1$` S

I map my Script task column StateCode to the reference query's StateCode column and check the StateName field as I want to add that to my data flow.

Data Flow

I've put a Data viewer after the Lookup task to verify my lookups worked

Data Viewer

Everything works and we're all happy.

Old Excel

If you are using a .xls file, you need to make the following changes. - In your Connection Manager, instead of the Office 12 provider, select the "Native OLE DB\Microsoft Jet 4.0 OLE DB Provider" - The Extended Properties become "EXCEL 8.0;HDR=Yes; IMEX=1;"

Notes for 2008+

A tip of the hat to Matt Masson (Blog|Twitter) for reminding me of the Cached Connection Manager. In particular, he points out that "you get weird behavior if you have multiple lookups against the same file." So, if you are on 2008+, look at using the CCM.

Cached Connection Manager Sequence Container

Before you need the cached data, load it into the cache like so

Cached Connection Manager Data Flow

The lookup basically remains the same. In the General tab, change the source to the cache connection manager and in the Connection tab, change to the CCM.

No comments: