What is the name of the current file
It's common to not know the exact name of the file you're loading. You often have to apply some logic - a file mask, date calculation logic, etc to determine what file needs to be loaded. In SSIS, we often use a ForEach File Enumerator to accomplish this but an Expression task or even an Execute SQL Task can be used to retrieve/set a file name. Today, I'm going to show you two different mechanisms for identifying the current file name.
Did you know that in SSIS, the Flat File Source exposes an Advanced property called FileNameColumnName. This shows in your Properties window for the Flat File Connection Manager
There are two different click paths for setting the FileNameColumnName property. The first is to right click on the Flat File Source and select the "Show Advanced Editor" option. There, navigate to Component Properties and you can set the FileNameColumnName property there.
The second is a combination of the Properties window and the Flat File Source itself. Select the Flat File Source and go to the Properties window. There you specify the FileNameColumnName property but notice, the Flat File Source itself is put into a Warning state. To fix that, we need to double click on the component and view the Columns tab. You'll notice the name we specified in the Properties window is now set in the Columns tab and the warning goes away.
Cool story, bro
That's cool and all, but it has two downsides. The first is due to me being a dumb American but the file name that is added to the data flow is DT_WSTR/unicode/nvarchar with a length of 260. That is awesome for internationalization to default to it. Except the systems I work in never have nvarchar defined so now I will need to use a Data Conversion task to change the supplied name into a non-unicode version. That's an irritant but since I know what the pattern is, I can live with it.
The real downside with this approach is that it only works for Flat File Source. Excel, Raw File Source, and XML sources do not expose the FileNameColumnName property. Now that is a problem in my book because when I'm automating, I'd have to have one set of source patterns for flat files and a different one for non-flat files.
A better approach
So, as much as I like the built in solution, my pattern is to use a Derived Column to inject the file name into the Data Flow. I have a variable called CurrentFileName in all my packages. That contains the design-time path for my Flat File Connection Manager (or Excel). My Connection Manager will then have the ConnectionString/ExcelFilePath property assigned to be @[User::CurrentFileName]. This positions me for success because all I need to do is ensure that whatever mechanism I am using to determine my source file correctly populates that variable. In this post, a ForEach File Enumerator will handle that.
Within my Data Flow Task, I will add a Derived Column Transformation that adds my package variable into the data flow as a new column. Here, I am specifying it will be of data type DT_STR with a length of 130.
Biml
What would a post be without some Biml to illustrate the point?
I use the following as my source file.
Col1|Col2 1|2 2|3 3|4
This Biml will generate an SSIS package that has a Data Flow with a Flat File Source, a Derived Column and a Row Count. What you're interested in is seeing how we either specify the value for the FileNameColumnName in our FlatFileSource tag or enrich our data flow by adding it in our Derived Column component.
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Connections> <FlatFileConnection FilePath="C:\ssisdata\SO\Input\CurrentFileName_0.txt" FileFormat="FFF_CurrentFileName" Name="FFCM" /> </Connections> <FileFormats> <FlatFileFormat Name="FFF_CurrentFileName" IsUnicode="false" FlatFileType="Delimited" ColumnNamesInFirstDataRow="true"> <Columns> <Column Name="Col1" DataType="Int32" Delimiter="|" /> <Column Name="Col2" DataType="Int32" Delimiter="CRLF" /> </Columns> </FlatFileFormat> </FileFormats> <Packages> <Package Name="CurrentFileName"> <Connections> <Connection ConnectionName="FFCM"> <Expressions> <Expression ExternalProperty="ConnectionString">@[User::CurrentFileName]</Expression> </Expressions> </Connection> </Connections> <Variables> <Variable DataType="String" Name="CurrentFileName">C:\ssisdata\SO\Input\CurrentFileName_0.txt</Variable> <Variable DataType="Int32" Name="RowCountSource">0</Variable> </Variables> <Tasks> <ForEachFileLoop Folder="C:\ssisdata\SO\Input" FileSpecification="CurrentFileName*.txt" Name="FELC Shred txt"> <VariableMappings> <VariableMapping VariableName="User.CurrentFileName" Name="0" /> </VariableMappings> <Tasks> <Dataflow Name="DFT Import data"> <Transformations> <FlatFileSource ConnectionName="FFCM" Name="FFCM Pull data" FileNameColumnName="CurrentFileNameSource" /> <DerivedColumns Name="DER Add CurrentFileName"> <Columns> <Column DataType="AnsiString" Name="CurrentFileName" Length="130">@[User::CurrentFileName]</Column> </Columns> </DerivedColumns> <RowCount VariableName="User.RowCountSource" Name="CNT Source Data" /> </Transformations> </Dataflow> </Tasks> </ForEachFileLoop> </Tasks> </Package> </Packages> </Biml>
If all goes well, when you run you package you should see something like the following
1 comment:
I think that you don't see the FlatFileName column in the excel and other types of connection managers is explicit.
The reason that it exists is because the flat file connection source (Data Flow) can sit on top of both the flat file connection manager, and the MULTI Flat file connection manager.
If you click on connection managers in the connection manager section, and you say "new connection:" There is a connection type called "MULTIFLATFILE".
This version of the flat file connector allows you to avoid using the for each loop method of managing flat files, it allows you to provide a file mask as a connection string. Even though this isn't a "standard" way to do it in SSIS circles, this method would be VERY familiar to someone used to using Informatica, which has a similar mechanism.
Since you don't have ANY idea what file is processing, or the order it is processing in (when using MULTIFLATFILE), you would WANT to have the flat file column name provided as it is not already associated to any variable.
Since this mechanism isn't supported in the other file connection managers (excel etc...) having this as an added feature to the data flow object would not be needed.
Post a Comment