World of Whatever

A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Loading...

Wednesday, May 22, 2013

Summit 2013, I'm speaking!

I'm in!

I might be a little stoked so please forgive this post but I've had a submission accepted for the 2013 Pass Summit. In the "BI Platform Architecture, Development & Administration" track, I'll be delivering a talk titled "Type more, click less: Programmatically build SSIS packages."

Gratuitous browser abuse

Marquee tag: I'm speaking!

Blink tag: I'm presenting at Summit 2013

Type more, click less: Programmatically build SSIS packages.

Why should you come to my session? Because you would love to automate some of the repetitive tasks you perform with SSIS.

You did know you didn't have to fire up BIDS/SSDT to make all your changes, right? In fact, did you know you can build complete packages without ever firing up BIDS? I wouldn't go that route unless I had a strong business case for doing so, but a little bit of up front work can yield an N% complete SSIS package.

Specific technologies I'll cover are BIMLScript, EzAPI and the base .NET library.

After attending this session, I want you to know what your options are and when it makes sense to use one approach instead of the other.

Monday, April 22, 2013

File in use by another process

There's nothing quite so delightful as getting error messages that don't provide enough information to resolve the base problem. Case in point, I went to run an SSIS package which in 2012 requires a rebuild of the .ispac and received the following error message.

System.IO.IOException: The process cannot access the file 'C:\Src\DataWarehouse\bin\Development\DataWarehouse.ispac' because it is being used by another process.

What is the other process? I've seen people resort to a reboot because they can't figure out what process has their grubby finger on the file but that is overkill. All you really need is the free utility "Process Explorer" from Microsoft/Sysinternals.

Download or simply run from the website (but you'll want to do download it and keep it as part of your troubleshooting kit) and once it pops up, under the Find menu, select "Find Handle or DLL..." Ctrl-F



In the resulting window, simply search for the file that is being reported as locked.



Here, I searched for the .ispac extension and discovered that a DtsDebugHost.exe still had the .ispac file open. In this specific example, that tells me I should check my Visual Studio instances as that'd be the process that launched DtsDebugHost. That was my instinct however and I had no instances of VS/SSDT/BIDS running so I chalked it up to a zombie process and just killed the process.

You can then simply double click on the line with the Process and observe that the underlying process list will find the offending PID (9388). In the resulting window, right click on the process line and select Kill

Saturday, April 13, 2013

SSIS Excel Source via Script

This is a quick and dirty post on how I deal with ugly Excel as a source in response to this StackOverflow question. Originally, I built this script to address some data I needed to import that was stored in Excel. It wasn't typical Excel data of rows and columns. No, it was a repeating report that was exported to Excel. There was no way in hell I was going to be able to make the Excel driver work with that data to get it imported. I recently had cause to bring this code out to parse a file that will be consistent with the layout but the column headers will change between sheets. It's a forecasting spreadsheet so tab 1 is people, table 2 is dollars, tab 3 is hours, etc.

Script Task

You will need to define your output columns and their data types and then go in and add 2 classes to the existing project which are described below.

Main

This is used in the Script Task's Main section. There is probably a more graceful way of doing this but it works. In my PreExecute, I instantiate a DataSet by calling the static method ReadExcelQuery. This fills a "raw" dataset, which is literally the data exactly out of Excel. I then pass the raw dataset into the ParseSample method and get a different dataset out. In this example, there were two sheets in the Excel Workbook that I needed to get the data out of. After the PreExecute has completed, my member variable parsedData is populated with a few data tables. There is a raw, which I find useful for seeing the data exactly as it was extracted from Excel. This was useful when I was running into dates which were really just integers (see the ConvertXlsdtToDateTime method). There will be a parsed data table which is populated from your specific Parse method. Finally, there is an Error data table which lists the row & column that didn't parse as expected.

Given the fun that is debugging in SSIS Script tasks, I strongly encourage you to create a .NET Console project. Inside that project's main, you can basically paste the PreExecute method's code in there and then work on getting your Parse logic squared away. Once that's done, then add them into the actual SSIS Script Task. At that point, you're simply wiring up the resultant dataset into the OuputBuffer.

   1:      using System;
   2:      using System.Data;
   3:      using System.Collections.Generic;
   4:      using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
   5:      using Microsoft.SqlServer.Dts.Runtime.Wrapper;
   6:   
   7:      /// <summary>
   8:      /// This is the class to which to add your code.  Do not change the name, attributes, or parent
   9:      /// of this class.
  10:      /// </summary>
  11:      [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
  12:      public class ScriptMain : UserComponent
  13:      {
  14:          DataSet parsedData;
  15:   
  16:          public override void PreExecute()
  17:          {
  18:              base.PreExecute();
  19:              this.parsedData = new DataSet();
  20:   
  21:              string fileName = this.Variables.CurrentFileName;
  22:              DataSet ds = null;
  23:              string worksheetName = string.Empty;
  24:              bool hasHeader = false;
  25:              DataSet excel = null;
  26:   
  27:              List<string> worksheets = new List<string>() { "Sheet1$", "Sheet2$"};
  28:              foreach (string worksheet in worksheets)
  29:              {
  30:                  excel = ExcelReader.ReadExcelQuery(fileName, worksheet, hasHeader);
  31:                  if (ds == null)
  32:                  {
  33:                      ds = ExcelParser.ParseSample(FileFormatter.Sample, excel, worksheet);
  34:                  }
  35:                  else
  36:                  {
  37:                      ds.Merge(ExcelParser.ParseSample(FileFormatter.Sample, excel, worksheet));
  38:                  }
  39:              }
  40:   
  41:              this.parsedData = ds.Copy();
  42:          }
  43:   
  44:          /// <summary>
  45:          /// Slog through parsedData and emit
  46:          /// </summary>
  47:          public override void CreateNewOutputRows()
  48:          {
  49:              bool pbCancel = false;
  50:              foreach (DataRow item in this.parsedData.Tables["Sample_TYPE"].Rows)
  51:              {
  52:                  try
  53:                  {
  54:                      SampleBuffer.AddRow();
  55:                      SampleBuffer.Company = item["Company"].ToString();
  56:                      SampleBuffer.Entity = item["Entity"].ToString();
  57:                      SampleBuffer.Function = item["Function"].ToString();
  58:   
  59:                  }
  60:                  catch (Exception)
  61:                  {
  62:   
  63:                      throw;
  64:                  }
  65:   
  66:              }
  67:   
  68:              foreach (DataRow item in this.parsedData.Tables["ERROR"].Rows)
  69:              {
  70:                  ErrorOutputBuffer.AddRow();
  71:                  ErrorOutputBuffer.entity = item["entity"].ToString();
  72:                  ErrorOutputBuffer.severity = item["severity"].ToString();
  73:                  ErrorOutputBuffer.messagecode = item["msg_cd"].ToString();
  74:                  ErrorOutputBuffer.messagetext = item["msg_tx"].ToString();
  75:                  ErrorOutputBuffer.value = item["value"].ToString();
  76:                  ComponentMetaData.FireError(0, "Sample parser", item["value"].ToString(), string.Empty, 0, out pbCancel);
  77:              }
  78:   
  79:          }
  80:   
  81:      }

ExcelParser class

Here is where the guts of your coding belongs. You will work through that raw dataset in a row by row fashion. It's not the greatest but it works. You will also need to define what your intended output looks like in the "MockDataTable" method.

   1:      using System;
   2:      using System.Collections.Generic;
   3:      using System.Data;
   4:      using System.Data.OleDb;
   5:      using System.Data.SqlClient;
   6:      using System.Diagnostics;
   7:      using System.Linq;
   8:      using System.Text;
   9:      using System.Text.RegularExpressions;
  10:   
  11:      /// <summary>
  12:      /// This class is concerned with parsing data from raw datasets
  13:      /// </summary>
  14:      public class ExcelParser
  15:      {
  16:          /// <summary>
  17:          /// Value translates to Class.MethodName, used for logging
  18:          /// </summary>
  19:          private static string logFileName = System.Reflection.MethodBase.GetCurrentMethod().DeclaringType.ToString();
  20:          
  21:          /// <summary>
  22:          /// Prevents a default instance of ExcelParser class from being created.
  23:          /// </summary>
  24:          private ExcelParser()
  25:          {
  26:          }
  27:   
  28:          /// <summary>
  29:          /// Parse some Excel file
  30:          /// </summary>
  31:          /// <param name="fileFormat"></param>
  32:          /// <param name="rawData"></param>
  33:          /// <returns></returns>
  34:          public static DataSet ParseSample(FileFormatter fileFormat, DataSet rawData, string workSheetName)
  35:          {
  36:              System.Data.DataSet ds = null;
  37:              System.Data.DataTable dt = null;
  38:              System.Data.DataTable errors = null;
  39:              System.Data.DataRow newRow = null;
  40:   
  41:              // monotomically increasing number
  42:              int rowNumber = 0;
  43:   
  44:              // Key to understand when a new group has been found
  45:              string currentEntity = string.Empty;
  46:              string current = string.Empty;
  47:              decimal someValue = 0;
  48:   
  49:              // Error reporting
  50:              string errorSeverity = string.Empty;
  51:              string errorCode = string.Empty;
  52:              string errorMessage = string.Empty;
  53:              string errorValue = string.Empty;
  54:   
  55:              errorSeverity = "fatal";
  56:              errorCode = "invalid data type";
  57:              errorMessage = "The supplied value does not match expected type";
  58:              errors = ExcelParser.MockErrorTable();
  59:   
  60:              ds = new DataSet();
  61:              dt = ExcelParser.MockDataTable(fileFormat);
  62:              int outRowOrdinal = 15;
  63:              int outYearOrdinal = 16;
  64:              int outTabNameOrdinal = 17;
  65:              bool keepRow = true;
  66:              int processingYear = 0;
  67:   
  68:              // Parsing logic goes here
  69:              try
  70:              {
  71:                  foreach (DataRow dr in rawData.Tables[0].Rows)
  72:                  {
  73:                      // Skip first 3 rows
  74:                      newRow = dt.NewRow();
  75:                      if (rowNumber < 2)
  76:                      {
  77:                          rowNumber++;
  78:                          continue;
  79:                      }
  80:                      
  81:                      // On the third row, we need to peel out the year
  82:                      if (rowNumber == 2)
  83:                      {
  84:                          // Jan 13 Forecast
  85:                          current = dr[3].ToString().Split()[1];
  86:                          processingYear = Int32.Parse(current) + 2000;
  87:                          rowNumber++;
  88:                          continue;
  89:                      }
  90:   
  91:                      // Keep track of the forecast amount as we only store rows which
  92:                      // are not zeroed out. 
  93:                      keepRow = false;
  94:   
  95:                      for (int i = 0; i < 15; i++)
  96:                      {
  97:                          current = dr[i].ToString();
  98:                          switch (i)
  99:                          {
 100:   
 101:                              case 0:
 102:                                  // Company
 103:                                  newRow[i] = current;
 104:                                  break;
 105:                              case 1:
 106:                                  // Department
 107:                                  if (!string.IsNullOrEmpty(current))
 108:                                  {
 109:                                      currentEntity = current.Trim();
 110:                                  }
 111:                                  newRow[i] = currentEntity;
 112:                                  break;
 113:                              case 2:
 114:                                  // Division
 115:   
 116:                                  newRow[i] = current.Trim();
 117:                                  break;
 118:                              default:
 119:                                  if (!Decimal.TryParse(current, out someValue))
 120:                                  {
 121:                                      errorValue = string.Format(
 122:                                          "Unable to convert {0} to an decimal. '{3}' ({1}{2})",
 123:                                          current,
 124:                                          ExcelParser.GetExcelColumnName(i),
 125:                                          rowNumber,
 126:                                          workSheetName);
 127:                                      errors.Rows.Add(new object[] { fileFormat.ToString(), errorSeverity, errorCode, errorMessage, errorValue });
 128:                                  }
 129:                                  else
 130:                                  {
 131:                                      newRow[i] = someValue;
 132:                                      if (someValue != 0)
 133:                                      {
 134:                                          keepRow = true;
 135:                                      }
 136:                                  }
 137:   
 138:                                  break;
 139:                          }
 140:                      }
 141:   
 142:                      newRow[outRowOrdinal] = rowNumber++;
 143:                      newRow[outTabNameOrdinal] = workSheetName.TrimEnd(new[] { '$' });
 144:                      newRow[outYearOrdinal] = processingYear;
 145:                      if (keepRow)
 146:                      {
 147:                          dt.Rows.Add(newRow);                        
 148:                      }
 149:                  }
 150:              }
 151:              catch (Exception ex)
 152:              {
 153:                  errorValue = string.Format(
 154:                              "{0} {4} '{3}' {1}{2}",
 155:                              logFileName,
 156:                              ExcelParser.GetExcelColumnName(0),
 157:                              rowNumber,
 158:                              workSheetName,
 159:                              ex.Message);
 160:                  //LogUtil.Error(logFileName, errorValue, ex);
 161:                  errors.Rows.Add(new object[] { fileFormat.ToString(), errorSeverity, "UnableToParse", "Unable to process file.  Contact development", errorValue });
 162:              }
 163:   
 164:              ds.Tables.Add(dt);
 165:              ds.Tables.Add(errors);
 166:   
 167:              //LogUtil.Debug(System.Reflection.MethodBase.GetCurrentMethod().DeclaringType.ToString(), string.Format("{0} Leaving method", ExcelParser.logFileName));
 168:   
 169:              return ds;
 170:          }
 171:   
 172:          /// <summary>
 173:          /// Method to generate the format for an error table
 174:          /// </summary>
 175:          /// <returns>A template for error messages</returns>
 176:          public static DataTable MockErrorTable()
 177:          {
 178:              DataTable dt = null;
 179:              dt = new DataTable("ERROR");
 180:              dt.Columns.Add("entity", System.Type.GetType("System.String"));
 181:              dt.Columns.Add("severity", System.Type.GetType("System.String"));
 182:              dt.Columns.Add("msg_cd", System.Type.GetType("System.String"));
 183:              dt.Columns.Add("msg_tx", System.Type.GetType("System.String"));
 184:              dt.Columns.Add("value", System.Type.GetType("System.String"));
 185:              return dt;
 186:          }
 187:   
 188:          /// <summary>
 189:          /// Translate a number into its equivalent Excel column name.  e.g. 0 -> A; 2 -> C
 190:          /// </summary>
 191:          /// <remarks>http://stackoverflow.com/questions/181596/how-to-convert-a-column-number-eg-127-into-an-excel-column-eg-aa</remarks>
 192:          /// <param name="columnNumber">A zero based number that maps to an excel column ordinal</param>
 193:          /// <returns>A string of letters that are Excel column names</returns>
 194:          public static string GetExcelColumnName(int columnNumber)
 195:          {
 196:              int dividend = columnNumber + 1;
 197:              string columnName = String.Empty;
 198:              int modulo;
 199:   
 200:              while (dividend > 0)
 201:              {
 202:                  modulo = (dividend - 1) % 26;
 203:                  columnName = Convert.ToChar(65 + modulo).ToString() + columnName;
 204:                  dividend = (int)((dividend - modulo) / 26);
 205:              }
 206:   
 207:              return columnName;
 208:          }
 209:   
 210:          /// <summary>
 211:          /// Seriously?  For the loss
 212:          /// <see cref="http://www.debugging.com/bug/19252"></see>
 213:          /// </summary>
 214:          /// <param name="excelDate">Number of days since 1900-01-01</param>
 215:          /// <returns>The converted days to date</returns>
 216:          public static DateTime ConvertXlsdtToDateTime(int excelDate)
 217:          {
 218:              DateTime dt = new DateTime(1899, 12, 31);
 219:   
 220:              // adjust for 29 Feb 1900 which Excel considers a valid date
 221:              if (excelDate >= 60)
 222:              {
 223:                  excelDate--;
 224:              }
 225:   
 226:              return dt.AddDays(excelDate);
 227:          }
 228:   
 229:          /// <summary>
 230:          /// In an attempt to simplify the actual parsing code, I have moved the
 231:          /// table creation and column definition into this method.
 232:          /// </summary>
 233:          /// <param name="fileFormat">The appropriate fileformat to use</param>
 234:          /// <returns>A well formed datatable for the given input type</returns>
 235:          private static DataTable MockDataTable(FileFormatter fileFormat)
 236:          {
 237:              DataTable dt = null;
 238:   
 239:              switch (fileFormat)
 240:              {
 241:                  case FileFormatter.Sample:
 242:                      dt = new System.Data.DataTable("Sample_TYPE", "tableNamespace");
 243:                      dt.Columns.Add("Company", System.Type.GetType("System.String"));
 244:                      dt.Columns.Add("Entity", System.Type.GetType("System.String"));
 245:                      dt.Columns.Add("Function", System.Type.GetType("System.String"));
 246:                      break;
 247:                  default:
 248:                      dt = new System.Data.DataTable("Unknown", "tableNamespace");
 249:                      dt.Columns.Add("row_id", System.Type.GetType("System.Int32"));
 250:                      dt.Columns.Add("name", System.Type.GetType("System.String"));
 251:                      dt.Columns.Add("value", System.Type.GetType("System.Decimal"));
 252:                      break;
 253:              }
 254:   
 255:              return dt;
 256:          }
 257:      }    
 258:      
 259:      /// <summary>
 260:      /// The specific file formatters.  
 261:      /// </summary>
 262:      public enum FileFormatter
 263:      {
 264:          /// <summary>
 265:          /// Budgeting file
 266:          /// </summary>
 267:          Sample,
 268:      }

ExcelReader class

This class is responsible for querying columns out of an Excel as a dataset. A worksheet has a $ at the end of its name which isn't visible in Excel. This also works with named ranges which won't have a trailing $.

   1:      using System;
   2:      using System.Collections.Generic;
   3:      using System.Data;
   4:      using System.Data.OleDb;
   5:      using System.Data.SqlClient;
   6:      using System.Diagnostics;
   7:      using System.Linq;
   8:      using System.Text;
   9:      using System.Text.RegularExpressions;
  10:   
  11:      /// <summary>
  12:      /// This class is concerned with extracting data from non-tabular Excel 
  13:      /// documents.
  14:      /// </summary>
  15:      public class ExcelReader
  16:      {
  17:          /// <summary>
  18:          /// Value translates to Class.MethodName, used for logging
  19:          /// </summary>
  20:          private static string logFileName = System.Reflection.MethodBase.GetCurrentMethod().DeclaringType.ToString();
  21:   
  22:          /// <summary>
  23:          /// Template for an OLEDB provider to read Excel data
  24:          /// </summary>
  25:          private static string providerTemplate = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";Extended Properties=\"Excel 8.0;HDR={1};IMEX=1\"";
  26:   
  27:          /// <summary>
  28:          /// Template for an OLEDB provider to read Excel data
  29:          /// </summary>
  30:          private static string providerTemplateXLSX = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=\"{0}\";Extended Properties=\"Excel 12.0 Xml;HDR={1};IMEX=1\"";
  31:   
  32:          /// <summary>
  33:          /// The row id column will always be the last element in the unparsed data
  34:          /// </summary>
  35:          private static string rowIdColumnName = "row_id";
  36:   
  37:          /// <summary>
  38:          /// Prevents a default instance of the ExcelReader class from being created.
  39:          /// Static methods so constructor is hidden
  40:          /// </summary>
  41:          private ExcelReader()
  42:          {
  43:          }
  44:   
  45:          /// <summary>
  46:          /// Gets the row id column name
  47:          /// </summary>
  48:          public static string RowIdColumnName
  49:          {
  50:              get { return ExcelReader.rowIdColumnName; }
  51:          }
  52:   
  53:          /// <summary>
  54:          /// Utility method for pulling base excel data into a dataset
  55:          /// </summary>
  56:          /// <param name="filePath">Base folder for a file</param>
  57:          /// <param name="fileName">actual file name</param>
  58:          /// <param name="worksheetName">The name of the Excel worksheet</param>
  59:          /// <param name="hasHeader">Boolean indicating whether there is a header row</param>
  60:          /// <returns>A dataset representation of a spreadsheet</returns>
  61:          public static DataSet ReadExcelQuery(string filePath, string fileName, string worksheetName, bool hasHeader)
  62:          {
  63:              return ExcelReader.ReadExcelQuery(System.IO.Path.Combine(filePath, fileName), worksheetName, hasHeader);
  64:          }
  65:   
  66:          /// <summary>
  67:          /// Utility method for pulling base excel data into a dataset
  68:          /// </summary>
  69:          /// <param name="fileName">The filename to work against</param>
  70:          /// <param name="worksheetName">The name of the Excel worksheet</param>
  71:          /// <param name="hasHeader">Boolean indicating whether there is a header row</param>
  72:          /// <returns>Dataset with raw dataset</returns>
  73:          public static DataSet ReadExcelQuery(string fileName, string worksheetName, bool hasHeader)
  74:          {
  75:              DataSet dataSet = new DataSet();
  76:              string datatableName = string.Empty;
  77:              string excelConnectionString = string.Empty;
  78:              string query = string.Empty;
  79:              datatableName = "RawData";
  80:              query = string.Format("SELECT * FROM [{0}]", worksheetName);
  81:   
  82:              excelConnectionString = string.Format(ExcelReader.providerTemplate, fileName, hasHeader ? "YES" : "NO");
  83:   
  84:              //Console.WriteLine(string.Format("{0} Processing file {1}  Query {2}", logFileName, fileName, query));
  85:              //Console.WriteLine(string.Format("{0} excelConnectionString {1}", logFileName, fileName));
  86:              //Console.WriteLine(string.Format("{0} excelConnectionString {1}", logFileName, excelConnectionString));
  87:   
  88:              // Ensure file exists
  89:              if (!System.IO.File.Exists(fileName))
  90:              {
  91:                  //Console.WriteLine(string.Format("FileNotFound!  Expected fileName: {0}", logFileName, fileName),
  92:                  //    new System.IO.FileNotFoundException("FileNotFount exception", fileName));
  93:                  return dataSet;
  94:              }
  95:   
  96:              try
  97:              {
  98:                  OleDbDataAdapter oleDbDataAdapter = new OleDbDataAdapter(
  99:                      query, excelConnectionString);
 100:   
 101:                  dataSet.Tables.Add(datatableName);
 102:                  oleDbDataAdapter.Fill(dataSet.Tables[datatableName]);
 103:   
 104:                  // add a column in for record id
 105:                  DataColumn rowIdColumn = new DataColumn(ExcelReader.rowIdColumnName, System.Type.GetType("System.Int32"));
 106:   
 107:                  dataSet.Tables[datatableName].Columns.Add(rowIdColumn);
 108:   
 109:                  // populate record id
 110:                  for (int i = 0; i < dataSet.Tables[datatableName].Rows.Count; i++)
 111:                  {
 112:                      dataSet.Tables[datatableName].Rows[i][ExcelReader.rowIdColumnName] = i + 1;
 113:                      if (hasHeader)
 114:                      {
 115:                          // TODO:  fix this dreck
 116:                          // This section is all wrong and untested code
 117:                          if (i == 0)
 118:                          {
 119:                              i++;
 120:                          }
 121:   
 122:                          dataSet.Tables[datatableName].Rows[i][ExcelReader.rowIdColumnName] = i + 1;
 123:                      }
 124:                  }
 125:   
 126:                  //LogUtil.Info(logFileName, string.Format("Dataset created for file: {0}", fileName));
 127:                  return dataSet;
 128:              }
 129:              catch (System.IO.IOException ex)
 130:              {
 131:                  // Most likely the file is locked by another process
 132:                  //LogUtil.Error(logFileName, string.Format("Exclusive lock on file: {0} Error:  {1}", fileName, ex.ToString()));
 133:                  return new DataSet();
 134:              }
 135:              catch (System.Data.OleDb.OleDbException ex)
 136:              {
 137:                  // The tab name does not exist in the spreadsheet.  
 138:                  // Odds are decent they forgot to append a $ to the tab name
 139:                  //Console.Error.WriteLine(string.Format("Unable to find tab {2} in file {0} Error:  {1}", fileName, ex.ToString(), worksheetName));
 140:   
 141:                  if (!worksheetName.EndsWith("$"))
 142:                  {
 143:                      //LogUtil.Error(logFileName, string.Format("Retrying as tab {1}$ in file {0}", fileName, worksheetName));
 144:                      return ExcelReader.ReadExcelQuery(fileName, worksheetName + "$", hasHeader);
 145:                  }
 146:   
 147:                  return new DataSet();
 148:              }
 149:              catch (Exception ex)
 150:              {
 151:                  //LogUtil.Error(logFileName, string.Format("Reading inital dataset failed: {0} Error:  {1}", fileName, ex.ToString()));
 152:                  return new DataSet();
 153:              }
 154:          }
 155:      }

Friday, March 22, 2013

Sync to SharePoint Workspaces

Sync to SharePoint Workspace, an undiscovered jewel

I hate SharePoint. It's generally a pain in the ass. Today, I was pointed as a SharePoint site being used as a document library for a client engagement. Lots of documents in there as we've been working with this client for a few months. One of our folders contains images of our data models and even though they were proper image files, for whatever reason, the server wouldn't serve them as such. Instead, I'd get a dialog asking me what I'd like to do with each one of them. I could tell you in detail what I thought SP could do with them but then I'd need to flag my blog as mature content

Instead, I said there has to be a better way. I first turned to the old standby of data sheet view but that didn't work as my client didn't supported (we use a hosted SP site). Wait, no it was the Windows Explorer view that is so handy. Nope, that didn't work either. At this point, I'm thinking where's my SkyDrive download all button.

Sync to SharePoint Workspace

Ladies and Gentlemen, I give you the SharePoint "Download All" button.
Sync to SharePoint Workspace
I will also present my neophyte SnagIt skills. Look at that sawtooth crop!

From what I've been reading, this synch is actually closer to a SkyDrive/DropBox without the need for an explicit client installation- or at least it's already set up from installing Office. The dialog asked me what I was looking to sync and where it should store the content locally and after a few minutes of downloading, I had the entirety of our SharePoint library on my machine. How cool is that?

Why is this the best thing ever?

That might be a bit of hyperbole, but it is rather useful. Enough so that I might be able to dial back the hatred to a distaste. What's so useful? For starters, I hate the constant page refresh when I'm in a SharePoint library/list. Windows Explorer isn't going to do that to me. I don't have to worry about whether someone has configured IIS to send the right handler information along. Windows already knows how to handle these files. I don't have to go through a click dance because "this file came from the internet and might melt your face". Well, I'll probably need to indicate that the path I Sync to is trusted but I can handle that.

References

Friday, March 15, 2013

SSIS-Encrypt data while in transit

A coworker asked if I had any experience with encrypting data while in transit with SSIS. Their client specifies that PII data must be protected at rest and in flight. They enforce the data at rest through something, it's not my client so I don't know the particulars but we'll be pulling data like a social security number, SSN, out of the financial system and into the data warehouse and while that data is between places, our ETL needs to safeguard that data.

I had never had to deal with encryption but in SSIS, or any ADO.NET connection, we can encrypt the connection and everything flowing down the wire should be encrypted. Encrypting Connections to SQL Server It looks like there’s some setup involved but once it’s done, it should be a matter of simply updating the connection string from a usage perspective. Update 2013-05-09 After attending Steve Jones's talk on The Encryption Primer, he suggested using IPSec tunneling as it will be far easier to implement than setting up SSL certs.

The second option was to use the native, but new-to-me, TSQL function for encryption and decryption. The following example shows me using Encrypt/DecryptBYPASSPHRASE functions to encrypt/decrypt the names from spt_values. At this point I will also point out, that crypto stuff has never been anything I've been passionate about so do your due diligence before you go implementing something you read on the interblag.
See also: Cryptographic functions

   1:  DECLARE @PassphraseEnteredByUser  nvarchar(128);
   2:  SET @PassphraseEnteredByUser  = 'Bill is the greatest';
   3:  WITH SRC AS
   4:  (
   5:      SELECT
   6:          SV.name
   7:      ,   ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS  FakeSK
   8:      FROM
   9:          master.dbo.spt_values AS SV
  10:  )
  11:  , ENCRYPTO AS
  12:  (
  13:      SELECT
  14:          S.name
  15:      ,   EncryptByPassPhrase
  16:          (
  17:              @PassphraseEnteredByUser
  18:          ,   S.name
  19:          ,   1
  20:          ,   CONVERT( varbinary,  S.FakeSK)
  21:          ) AS  encrypted_text
  22:      ,   S.FakeSK
  23:      FROM
  24:          SRC S
  25:  )
  26:  , DECRYPTO AS
  27:  (
  28:      SELECT
  29:          E.*
  30:      ,   CONVERT(nvarchar, DECRYPTBYPASSPHRASE
  31:          (
  32:              @PassphraseEnteredByUser
  33:          ,   E.encrypted_text
  34:          ,   1
  35:          ,   CONVERT( varbinary,  E.FakeSK)
  36:          )) AS  decrypted_text
  37:      FROM
  38:          ENCRYPTO E
  39:  )
  40:  SELECT
  41:      D.* 
  42:  FROM 
  43:      DECRYPTO AS D;

In the SSIS world, we could use this as our source query. The challenge around doing that we’d either need to write the encrypted value into a staging table and then decrypt in a post cleanup step (Execute SQL task) or decrypt in the data flow prior to writing to the target table. But at that point, we’d have unencrypted SSNs in memory. If we run out of memory in the DF and spill to disk, the temp files would have the unencrypted SSN in them too. The automatic, post execution step would handle the cleanup on those files but there’d be the ghost image of them if people were NSA crazy about getting the data back. Either way, that might violate the client's rules on PII security.

The other thing to note about encrypting the data is that you’re going to go from a simple 9 byte allocation per row to varbinary(8000) which will result in the data flow either writing the binary portion to disk and carrying a pointer along to that field in the in memory portion or the engine will have to allocate 8k bytes of memory per row in the buffer. Either way, you’re probably going to feel a performance impact for large datasets but we’d have to test before we can know the possible severity.

Thursday, March 14, 2013

Moving SQL Server databases

It's not uncommon to find SQL Server installations where the physical file layout is less than optimal. We have a client who is having some performance issues and in initial discovery, we observed that they had the system databases on the C drive and had put their user databases and log on the D. The challenge you can run into, and they are experiencing it, is that heavy workloads can saturate your I/O subsystem and that brings everything to a crawl. There's also a risk of running your C drive out of disk space that way which can cause the OS itself to stop responding. The client has already purchased a SAN and as part of our effort, we're going to relocate user and system databases onto the SAN.

There are lots of permutations on how to lay that out but the general rule is to isolate logs, data files, tempdb and if you can spare it, the backups from each other. Knowing that's the right design is one thing, but how do I go about fixing a broken one? MSDN to the rescue, move system databases.

I started by moving tempdb. Since tempdb doesn't require moving the data or log files, it was the easiest to deal with. The reason you don't need to move them, is that they are recreated whenever the system is restarted. I went through the example and was able to change the location of my tempdb as easy as can be.

The "other" system databases, those seemed a little more daunting because now I was going to be dealing with TSQL and OS commands. Yes, it's just "move" but still, I'm now going to be in two different environments and I'll need to repeat the process for all the databases. At this point, I start to get nervous about the amount of manual work to be done and more importantly the opportunity for errors to creep in. I don't know if I'm going to be pushing buttons at the client site or if they'll be at the keyboard. An automated solution sounds like it might be called for.

I wanted to go PowerShell but I held off on that approach as I wanted to deal with this situation a few times before I felt comfortable that I'd stepped on all the landmines. Instead, I wrote a TSQL script to generate my changes. The script generates 4 sets of commands for each row in sys.master_files. The first 2 columns change the file locations in the database and on disk. The last 2 columns undo those changes. Pro-tip: always have a backout strategy. For those looking at my SSMS Templates, in my Utility folder, I called this one DatabaseFileMover.sql As of right now, that code looks like the following

   1:  DECLARE
   2:      @NewLogLocation varchar(256) = 'L:\SQLLog'
   3:  ,   @NewBackupLocation varchar(256) = 'M:\SQLBackups'
   4:  ,   @NewDataLocation varchar(256) = 'S:\SQLData'
   5:  ,   @NewTempDBLocation varchar(256) = 'T:\TempDB'
   6:  ,   @AlterCommand varchar(512) = ' 
   7:  ALTER DATABASE <dbname/>  
   8:  MODIFY FILE (NAME = <name/>, FILENAME = ''<new_location/>'');'
   9:  ,   @MoveCommand varchar(512) = 'MOVE "<old_location/>" "<new_location/>"'
  10:  ,   @DeleteCommand varchar(512) = 'DEL "<old_location/>"'; 
  11:    
  12:  SELECT
  13:      CASE
  14:          WHEN D.type_desc = 'ROWS' AND D.DatabaseName <> 'tempdb' 
  15:              -- Undo version 
  16:              -- THEN REPLACE(REPLACE(REPLACE(@AlterCommand, '<dbname/>', D.databaseName), '<name/>', D.LogicalName), '<new_location/>', D.PhysicalName) 
  17:              THEN REPLACE(REPLACE(REPLACE(@AlterCommand, '<dbname/>', QUOTENAME(D.databaseName)), '<name/>', D.LogicalName), '<new_location/>', @NewDataLocation + '\' + D.BaseFileName) 
  18:          WHEN D.type_desc = 'LOG' AND D.DatabaseName <> 'tempdb'  
  19:              -- Undo version 
  20:              -- THEN REPLACE(REPLACE(REPLACE(@AlterCommand, '<dbname/>', D.databaseName), '<name/>', D.LogicalName), '<new_location/>', D.PhysicalName) 
  21:              THEN REPLACE(REPLACE(REPLACE(@AlterCommand, '<dbname/>', QUOTENAME(D.databaseName)), '<name/>', D.LogicalName), '<new_location/>', @NewLogLocation + '\' + D.BaseFileName) 
  22:          WHEN D.DatabaseName = 'tempdb'  
  23:              THEN REPLACE(REPLACE(REPLACE(@AlterCommand, '<dbname/>', QUOTENAME(D.databaseName)), '<name/>', D.LogicalName), '<new_location/>',  @NewTempDBLocation + '\' + D.BaseFileName) 
  24:          ELSE NULL 
  25:      END AS AlterCommand 
  26:  ,   CASE  
  27:          WHEN D.type_desc = 'ROWS' AND D.DatabaseName <> 'tempdb'  
  28:              THEN REPLACE(REPLACE(REPLACE(@MoveCommand, '<old_location/>', D.PhysicalName), '<name/>', D.LogicalName), '<new_location/>', @NewDataLocation + '\' + D.BaseFileName) 
  29:          WHEN D.type_desc = 'LOG' AND D.DatabaseName <> 'tempdb'  
  30:              THEN REPLACE(REPLACE(REPLACE(@MoveCommand, '<old_location/>', D.PhysicalName), '<name/>', D.LogicalName), '<new_location/>', @NewLogLocation + '\' + D.BaseFileName) 
  31:          WHEN D.DatabaseName = 'tempdb'  
  32:              THEN REPLACE(REPLACE(REPLACE(@DeleteCommand, '<old_location/>', D.PhysicalName), '<name/>', D.LogicalName), '<new_location/>', @NewLogLocation + '\' + D.BaseFileName) 
  33:      END AS MoveCommand 
  34:  ,   CASE 
  35:          WHEN D.type_desc = 'ROWS' AND D.DatabaseName <> 'tempdb'  
  36:              THEN REPLACE(REPLACE(REPLACE(@AlterCommand, '<dbname/>', QUOTENAME(D.databaseName)), '<name/>', D.LogicalName), '<new_location/>', D.PhysicalName) 
  37:          WHEN D.type_desc = 'LOG' AND D.DatabaseName <> 'tempdb'  
  38:              THEN REPLACE(REPLACE(REPLACE(@AlterCommand, '<dbname/>', QUOTENAME(D.databaseName)), '<name/>', D.LogicalName), '<new_location/>', D.PhysicalName) 
  39:          WHEN D.DatabaseName = 'tempdb'  
  40:              THEN REPLACE(REPLACE(REPLACE(@AlterCommand, '<dbname/>', QUOTENAME(D.databaseName)), '<name/>', D.LogicalName), '<new_location/>',  D.PhysicalName) 
  41:          ELSE NULL 
  42:      END AS UndoAlterCommand 
  43:  ,   CASE  
  44:          WHEN D.type_desc = 'ROWS' AND D.DatabaseName <> 'tempdb'  
  45:              THEN REPLACE(REPLACE(REPLACE(@MoveCommand, '<new_location/>', D.PhysicalName), '<name/>', D.LogicalName), '<old_location/>', @NewDataLocation + '\' + D.BaseFileName) 
  46:          WHEN D.type_desc = 'LOG' AND D.DatabaseName <> 'tempdb'  
  47:              THEN REPLACE(REPLACE(REPLACE(@MoveCommand, '<new_location/>', D.PhysicalName), '<name/>', D.LogicalName), '<old_location/>', @NewLogLocation + '\' + D.BaseFileName) 
  48:          WHEN D.DatabaseName = 'tempdb'  
  49:              THEN REPLACE(REPLACE(REPLACE(@DeleteCommand, '<new_location/>', D.PhysicalName), '<name/>', D.LogicalName), '<old_location/>', @NewLogLocation + '\' + D.BaseFileName) 
  50:      END AS UndoMoveCommand 
  51:  FROM 
  52:  ( 
  53:      SELECT D.* 
  54:      ,   LEFT(D.PhysicalName, D.LastSlash) AS CurrentFolder 
  55:      ,   RIGHT(D.PhysicalName, LEN(D.PhysicalName) - D.LastSlash - 1) AS BaseFileName 
  56:      FROM 
  57:      ( 
  58:          SELECT  
  59:              name AS LogicalName 
  60:          ,   physical_name AS PhysicalName 
  61:          ,   LEN(MF.physical_name) - CHARINDEX('\', REVERSE(MF.physical_name)) AS LastSlash 
  62:          ,   MF.type_desc 
  63:          ,   DB_NAME(MF.database_id) AS DatabaseName 
  64:          ,   MF.database_id 
  65:          FROM 
  66:              sys.master_files AS MF 
  67:          --WHERE 
  68:          --    MF.database_id < 5 
  69:      ) D 
  70:  ) D 
It generated output like
AlterCommandMoveCommandUndoAlterCommandUndoMoveCommand
ALTER DATABASE [master] MODIFY FILE (NAME = master, FILENAME = 'S:\SQLData\master.mdf'); MOVE "C:\Program Files\Microsoft SQL Server\MSSQL11.DEV2012\MSSQL\DATA\master.mdf" "S:\SQLData\master.mdf" ALTER DATABASE [master] MODIFY FILE (NAME = master, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.DEV2012\MSSQL\DATA\master.mdf'); MOVE "S:\SQLData\master.mdf" "C:\Program Files\Microsoft SQL Server\MSSQL11.DEV2012\MSSQL\DATA\master.mdf"
ALTER DATABASE [master] MODIFY FILE (NAME = mastlog, FILENAME = 'L:\SQLLog\mastlog.ldf'); MOVE "C:\Program Files\Microsoft SQL Server\MSSQL11.DEV2012\MSSQL\DATA\mastlog.ldf" "L:\SQLLog\mastlog.ldf" ALTER DATABASE [master] MODIFY FILE (NAME = mastlog, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.DEV2012\MSSQL\DATA\mastlog.ldf'); MOVE "L:\SQLLog\mastlog.ldf" "C:\Program Files\Microsoft SQL Server\MSSQL11.DEV2012\MSSQL\DATA\mastlog.ldf"
That looks about right for what I was intending. Column 0 was saved out to AlterCommand.sql. Column 1 to MoveCommand.bat. Column 2 became UndoAlterCommand.sql and Column 3 became UndoMoveCommand.bat.

I stared at the pending alter statements long and hard. Had a cup of tea and finally hit F5 but it failed. See, I didn't have any drive on my machine but C. Instead, I thought I could fake it out by using my friend subst. It works great for the SSIS scenarios I have where I need to pretend I have a particular path available for a hardcoded thing. But, SQL Server is smarter than that. It could see through my shenanigans and determine that my S, L and T drives were fakes and would not allow the ALTER DATABASE statements to pass.

After replace all those fake drive letters with actual paths on C:, I was ready. I fired off the alter commands and they all worked. SQL Server reported that those changes would take effect next time I restarted the service. Goodie, I double click my MoveFile.bat and it promptly throws up because those files are all in an administrator location (thanks windows 8). I couldn't see it failing though as the batch script ended too quickly. If you add the command PAUSE at the end of a .bat file, it will keep the window open until someone hits a key. I fix the file, right click and run as admin but those files are still in use. Oh yeah, *stop* the service, then move the files. I stop SQL Server and re-run the batch script and see my mdfs and ldfs move into the right folders. Great success.

I started the service and what was that flash? Oh crap, why is it still red? Eventviewer, eventvwr, got me started by reminding me that I have an error log available that I can read without SSMS. It kindly informed me that on start up it couldn't find my master database. The funny thing was, it was still looking in C:\Program Files... Sonofagun but fortunately I had my UndoMoveCommand.bat script there and after running as admin, it put back the files in their correct spots. Restart SQL Server and it's still broken. This time it found master data and log files but it was looking for model and msdb in C:\FakeDriveLocations. What the hell? The ALTER statement worked for everything *but* master. That seems unlikely. I looked back at the alter scripts and there didn't seem to be any diference between master and model commands. Inspiration finally struck after reading and rereading the errorlog file and I looked at the service in SQL Server Configuration Manager (SQLServerManager11.msc). There on the Advanced tab was startup parameters of "-dC:\Program Files...\master.mdf;-e...ErrorLog;-lC:\Program Files...\mastlog.ldf"

startup parameters

Database engine startup options specify that -d and -l specify where the master database data and log file should live. To heck with start up parameters, I already specified where the files should live before I broke the server, use that! So I copied out the existing parameters and cut out the -d and -l options. I figured the error log was fine where it was and restarted the service. Again SQL Server failed to start but this time the error message simply specified that it couldn't find master.mdf and mastlog.ldf-no path specified.

Intriguing, but at this point, I just want to get my database back up. Once again, I modify the start up parameters and provide explicit paths for the master database using the new paths. This time when I start the service, it starts and all my databases are available. Nothing outrageous in the error log either. I start SQL Agent and it too seems fine. Huzzah!

I undo the whole process by running UndoAlterStatement.sql, stopping the SQL Service, running UndoMoveCommand.bat as an admin, reverting the startup parameters for the service and finally restarting the service. Once again, green lights and my database seems operational. I think I have a winner.

How I plan to migrate database files

This, as with all information on this blog, is merely what I'm doing. Doing so in your environment may cause loss of data, irreparable harm and loss of life. Procede at your own risk.
  1. Take a full backup up of everything and ensure it was good
  2. Edit the above template script to ensure the files are going to valid locations
  3. Run the script and save out each column as described. Be sure to add the PAUSE command at the end of the .bat files
  4. Run the AlterCommand script against the server "sqlcmd -S yourserver -i AlterCommand.sql"
  5. Use SQL Server Configuration Manager to stop the SQL Server service
  6. Change the Startup Parameters to use the new locations for the master mdf and ldf
  7. Run the MoveCommand.bat as an admin
  8. Restart the SQL Service and check the ErrorLog
  9. Connect to the instance via SSMS and ensure all databases are online and accessible
  10. Check for database mail being configured and send an email

Wednesday, March 13, 2013

SSMS templates

SSMS templates, a gem you probably aren't using

How do you keep track of the queries you need to run regularly?

SQL Server Management Studio has a Template Explorer feature that you probably have never seen. Which is a pity, because it can be a real time saver.

Accessing the Template Explorer

The Template Explorer is available under the View menu or Ctrl-Alt-T for the keyboard fans.
View, Template Explorer

Now you have the "Template Browser" panel open, feel free to mock the inconsistent name from the menu to the actual panel title. Template explorer is labeled template browser

Using a template

How do you add extended properties to a table? Sure, you can punch that into AltaVista or whatever search engine you use and you'll get hits for MSDN and StackOverflow but what if you didn't have to go there? You don't, in fact. It was right there on your machine, a perfect example waiting for you in the Template Explorer.

Heaven forbid, what if you didn't have access to the internet? I know I don't install help files anymore and the few times I'm flying, I'm pretty far from having internet access. Double click on "Add Extended Properties to Table" and the following query gets pushed into a new SSMS query pane.

   1:  -- =========================================
   2:  -- Add Extended Properties to Table template
   3:  -- =========================================
   4:  USE <database, sysname, AdventureWorks>
   5:  GO
   6:   
   7:  IF OBJECT_ID('<schema_name, sysname, dbo>.<table_name, sysname, sample_table>', 'U') IS NOT NULL
   8:    DROP TABLE <schema_name, sysname, dbo>.<table_name, sysname, sample_table>
   9:  GO
  10:   
  11:  CREATE TABLE <schema_name, sysname, dbo>.<table_name, sysname, sample_table>
  12:  (
  13:      <columns_in_primary_key, , c1> <column1_datatype, , int> <column1_nullability,, NOT NULL>, 
  14:      CONSTRAINT <contraint_name, sysname, PK_sample_table> PRIMARY KEY (<columns_in_primary_key, , c1>)
  15:  )
  16:  -- Add description to table object
  17:  EXEC sys.sp_addextendedproperty 
  18:      @name=N'MS_Description', 
  19:      @value=N'<table_description_value,,Table description here>' ,
  20:      @level0type=N'SCHEMA', 
  21:      @level0name=N'<schema_name, sysname, dbo>', 
  22:      @level1type=N'TABLE', 
  23:      @level1name=N'<table_name, sysname, sample_table>'
  24:  GO
  25:   
  26:  -- Add description to a specific column
  27:  EXEC sys.sp_addextendedproperty 
  28:      @name=N'MS_Description', 
  29:      @value=N'<column_description,,Column description here>' ,
  30:      @level0type=N'SCHEMA', 
  31:      @level0name=N'<schema_name, sysname, dbo>', 
  32:      @level1type=N'TABLE', 
  33:      @level1name=N'<table_name, sysname, sample_table>', 
  34:      @level2type=N'COLUMN', 
  35:      @level2name=N'<columns_in_primary_key, , c1>'
  36:  GO

Ok, now what?
Finding Nemo, now what scene
All of those beautiful <parameter, [type], value> in the above mean something to SSMS. They are template parameters and to unlock their power, hit Ctrl-Shift-M.

The resulting pop-up window looks like
Specify Values for Template Parameters Compare the first row in that replace window to line 4 of the template query-all that SSMS did was parse out the 3 values between the < and the > and assume that what's in between is getting replaced. It's not fool proof but it's "good enough."

Really using templates

There's nothing stopping you from making your own templates. Think about it, how do you keep track of all your queries? For my money, I'm using Templates. Currently, I have 4 SSMSTemplate folders available on my dropbox site.
  • 45NewTSQLFeatures - this was my first attempt at a template library. I had a SQL Saturday presentation on 45 new language features in 2005 and 2008 in 45 minutes. Now it's my syntax reference for those features
  • SSIS - I built this out for my 2012 PASS Summit Lightning talk. It allows me to deploy .ispac files into the new SSISDB as well as tear it all down
  • Sweets - this is my SQL Saturday presentation on the new language features in SQL Server 2012.
  • Utility - this a work in progress but the basic idea is that with my new job, I'm going to be walking into lots of strange, new environments and I'll need to be able to hit the ground running. I took Brent Ozar's concept of sp_blitz and added in all the other scripts I might need to handle whatever is thrown at me.

Installation

After Template Explorer is first opened, SSMS will create the basic structure in your AppData folder. If you can't see it, take the training wheels off Windows Explorer and show hidden folders. For my account and SQL Server 2012, they were installed in C:\Users\bfellows\AppData\Roaming\Microsoft\SQL Server Management Studio\110\Templates. Download the folders to your Templates folder and restart SSMS (it only scans for files when it is first opened).

Making your own

In the Template Browser, right click in the background and you will have an option for New and there you can either create a new script or folder. Right-click and select Edit to change a template, otherwise you'll only be changing a copy and not the original.