tag:blogger.com,1999:blog-4583365003930185416.post8866471095346838931..comments2024-01-29T01:48:06.546-06:00Comments on World of Whatever: ETL file processing patternBill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-4583365003930185416.post-1820528597110745842016-04-05T08:08:06.164-05:002016-04-05T08:08:06.164-05:00I often see some variations on this pattern:
1) th...I often see some variations on this pattern:<br />1) this pattern, but instead of altering the original file name to include a timestamp, put each file into a folder timestamped with the ETL start time ("SSISData\Subject Area\Archive\201604051504\input.csv")<br />2) the "Subject Area" and "Input/Output/Archive" folders turned around: "SSISData\Input\Subject Area"; "SSISData\Archive\201604051504\Subject Area\input.csv"<br />3) Another folder on the "Input/Output/Archive" level, called "In Progress". This is a folder users don't have access to, so you can't mess with the files the SSIS is processing. Also enhances the debug experience: after a failing ETL process, it's even more clear which files were being processed.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-4583365003930185416.post-39764952830943498022016-04-04T15:06:41.914-05:002016-04-04T15:06:41.914-05:00I like this approach as well, it seems like I have...I like this approach as well, it seems like I have worked with a guy who liked it.<br /><br />What I find interesting is the lack of ETL knowledge from a day to day level that is on the internet. We find lots of articles solving specific problems, but we rarely find ones solving the problem of how to do good release management, and good file management.<br /><br />There are some good articles, and some good information, but you never hear about any good books or definitive patterns for it that aren't ancient.<br /><br />Things work differently now than they did 20 years ago, but there is still this mind set around older technologies dictating the new information.<br /><br />Glad to see someone tackling something that "seems" so obvious, but you can really get bitten by when you go to other locations.<br /><br />The only thing that I have to note about this is that this CAN fall down as a general rule when you start to receive a LOT of files into a specific folder. The archive itself can become burdened with cruft. I haven't seen a method for solving this in any particularly way yet, but I imagine you could set up some archive process that moved things to dated folders if they were older than X months old.<br /><br />I had a problem with my last job where we had many files per day, all very large, and the actual sorting in the folder itself was slow to just bring the data up, it would have been useful to have another method, unfortunately there wouldn't have been one as easy as just leaving them in the same folder and waiting.Christopher Harschhttps://www.blogger.com/profile/15327387322815937997noreply@blogger.com