There was a question on StackOverflow: Excel Source as Lookup Transformation Connection asking how one can use Excel as a source in a lookup. As I worked on the answer, I figured it'd be a good thing to post here so I'm copying and pasting my work. The important thing is that this approach isn't restricted to just the lookup task. If you are having trouble using Excel in SSIS as a normal data source because it thinks the data types have changed, access it via the OLE DB driver.
As you have correctly discerned, you cannot use the Excel Connection Manager in a Lookup task, it only accepts the OLE DB connection manager. The "normal" Excel connection isn't an OLE DB type. The trick then, is to use an OLE DB connection manager with an Excel spreadsheet.
For this example, I have a spreadsheet with state codes and their full name and my source data only has the abbreviations flowing through. This walk through will wire up a lookup task against Excel to retrieve those values.
Keep these caveats in mind: Lookups are case sensitive, regardless of whether the source database (or file in this case) is case sensitive. Excel strings are always going to be interpreted as unicode/nvarchar.
Source Data
Given a simple file sitting at C:\tmp\LookupReference.xlsx that looks like the following
Connection Manager
We must first establish an OLE DB Connection Manager. Instead of the default "Native OLE DB\SQL Server Native Client 10.0" change that to "Native OLE DB\Microsoft Office 12.0 Access Database Engine OLE DB Provider". In the Server or file name, locate the source file. Clicking Test Connection at this point will result in an error.
Here comes the "magic." Click the All tab and scroll to the top. In Extended Properties, add the indicated string "Excel 12.0;HDR=YES; IMEX=1;" This tells the provider that we are going to use Excel 12.0 (.xlsx format) with a header row and the IMEX 1 tells the driver there will be intermixed data.
Your package sould now look something like this. A connection manager with extended properties set and assumes a preexisting data flow
Data Flow
To simplify matters, I have a script source that generates 3 rows of data with state codes MO, KS and NE and sends them down the pipeline. Your source will obviously be different but the concept will remain the same. Sample code provided in the annotation.
In your lookup transformation, you will need to write a query against the spreadsheet. It's similar to a normal database query except your table is going to be Sheet1$
unless you have a named range in which your table would be MyRange
Note the $ is required when referencing a sheet. Using the sample spreadsheet above, my query would be
SELECT
S.StateCode
, S.StateName
FROM
`Sheet1$` S
I map my Script task column StateCode to the reference query's StateCode column and check the StateName field as I want to add that to my data flow.
I've put a Data viewer after the Lookup task to verify my lookups worked
Everything works and we're all happy.
Old Excel
If you are using a .xls file, you need to make the following changes. - In your Connection Manager, instead of the Office 12 provider, select the "Native OLE DB\Microsoft Jet 4.0 OLE DB Provider" - The Extended Properties become "EXCEL 8.0;HDR=Yes; IMEX=1;"
Notes for 2008+
A tip of the hat to Matt Masson (Blog|Twitter) for reminding me of the Cached Connection Manager. In particular, he points out that "you get weird behavior if you have multiple lookups against the same file." So, if you are on 2008+, look at using the CCM.
Before you need the cached data, load it into the cache like so
The lookup basically remains the same. In the General tab, change the source to the cache connection manager and in the Connection tab, change to the CCM.
No comments:
Post a Comment