tag:blogger.com,1999:blog-45833650039301854162022-01-12T15:52:55.393-06:00World of WhateverA blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.comBlogger280125tag:blogger.com,1999:blog-4583365003930185416.post-54124008958517794992022-01-06T08:00:00.001-06:002022-01-06T08:00:00.201-06:00SSIS Azure Feature Pack and the Flexible File components<h1>SSIS Azure Feature Pack and the Flexible File components</h1><p>The <a href="https://docs.microsoft.com/en-us/sql/integration-services/azure-feature-pack-for-integration-services-ssis?view=sql-server-ver15" target="_blank">Azure Feature Pack for SSIS</a> is something I had not worked with before today. I have a client that wants to use the Flexible File Task/Flexible File Source/Flexible File Destination but they were having issues. The Flexible File tools allow you to work with Azure Blob storage. We were dealing with ADLS Gen2 but the feature pack can work with classic blob storage as well. In my hubris, I said no problem, <a href="https://stackoverflow.com/tags/ssis/topusers" target="_blank">I <em>know</em> SSIS</a>. Dear reader, I did not know as much as I thought I did... <br><div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEio9gmV6yg9yx_yOOV7QGIgS8ZyjjOjpkBEwhSOKXHU7ow4PjFkS1N48jYwWtzdywUbiQCSdcqEfQztQU4634aVJnPd0CTW4NDJ0RnTXa6e1ngGeCK_lZtMCbIlZUf3BTJy8_RimfW2rNIxjNQX6Qkitmm_Lqj4NJmK5ETO_RzHiem1tpyzz1RoCRac" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="250" data-original-width="499" src="https://blogger.googleusercontent.com/img/a/AVvXsEio9gmV6yg9yx_yOOV7QGIgS8ZyjjOjpkBEwhSOKXHU7ow4PjFkS1N48jYwWtzdywUbiQCSdcqEfQztQU4634aVJnPd0CTW4NDJ0RnTXa6e1ngGeCK_lZtMCbIlZUf3BTJy8_RimfW2rNIxjNQX6Qkitmm_Lqj4NJmK5ETO_RzHiem1tpyzz1RoCRac"/></a></div> <p>Our scenario was simple. We had a root folder datalake and subfolders of Raw. And into that we were needed to land and then consume files. Easy peasy. The Flexible File Destination allows us to write. The Flexible File Source allows us to read and we can configure a Foreach File enumerator to use the "Foreach Data Lake Storage Gen2 File Enumerator" to interact with the file system. Everything is the same as Windows except we use forward slashes instead of backslashes for path separators. <br> <div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEinTE37-Qtq2aGIhHkdrxF0OgBen9b7VLTceu4ClXUV2pNeP77WVWA2CDOlByxBE8cjbg_DBNr63ydnoR_9Lm6NbBogNpXqo0LWodVDDvoaPh5fHU3t4S9KBG8j8MuxEXVMRPPrjmxAl_WWVO3rRmRei6uRHeo1thbPMCw1MXn4_VyJ2JwXZYPU7mHQ" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="296" data-original-width="655" src="https://blogger.googleusercontent.com/img/a/AVvXsEinTE37-Qtq2aGIhHkdrxF0OgBen9b7VLTceu4ClXUV2pNeP77WVWA2CDOlByxBE8cjbg_DBNr63ydnoR_9Lm6NbBogNpXqo0LWodVDDvoaPh5fHU3t4S9KBG8j8MuxEXVMRPPrjmxAl_WWVO3rRmRei6uRHeo1thbPMCw1MXn4_VyJ2JwXZYPU7mHQ"/></a></div> <p>I started with the Flexible File Source after I manually created a CSV and uploaded it to data lake. One of the things I wasn't sure about was path syntax - do I need to specify the leading slash, do I need to specify the trailing slash, etc. I think I determined it didn't matter, it'd figure out whether folder path needs a trailing slash if it wasn't specified. <div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhwrSyljokxfcaUOls62e5Fb85af78lqNrKV7zUb608vfKpbum-elE_r9cxo6mkJ0iLduS-sjhVhrNf_3qIADmboifRP35Gx3qP7Ak43hKSjRxoVFy3szO6LMckciRCifXFubd3SfoumluPtwhOAlOtXzsP9s_ZXUN1gjE_ii3t1GIKM_q8SmNnMHfA" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="635" data-original-width="751" src="https://blogger.googleusercontent.com/img/a/AVvXsEhwrSyljokxfcaUOls62e5Fb85af78lqNrKV7zUb608vfKpbum-elE_r9cxo6mkJ0iLduS-sjhVhrNf_3qIADmboifRP35Gx3qP7Ak43hKSjRxoVFy3szO6LMckciRCifXFubd3SfoumluPtwhOAlOtXzsP9s_ZXUN1gjE_ii3t1GIKM_q8SmNnMHfA"/></a></div> <p>As I was testing things, changing that value and the value in the Flat File Destination each time was going to be annoying so I wanted to see what my options were for using Expresssions. Expressions are the secret sauce to making your SSIS packages graceful. The development teams took the same approach they have with the ADO.NET components in that there are no expressions on the components themselves. Instead, if you want to make the Flexible File Source/Flexible File Destination dynamic at all, you're going to have to look at the Data Flow Tasks Expressions and then configure there. <div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgKOHJZwRu6YxJ9mMMOdGPKB16S_nh2XM8QjDkcEEzpQCC6f-Rq96F5XEj9inTwxW2pc6wi9ueJh_Iw-JU3caVSJse-0hM-IINiZ-m_SZ_SCKtqY7Z5Ff2zei7WlRN9hbCVDsGhAuTXPR03bCAB-EcVo67wPRxp0KdssUQh2jtyYaC5YQ-aAgKbk7yY" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="112" data-original-width="348" src="https://blogger.googleusercontent.com/img/a/AVvXsEgKOHJZwRu6YxJ9mMMOdGPKB16S_nh2XM8QjDkcEEzpQCC6f-Rq96F5XEj9inTwxW2pc6wi9ueJh_Iw-JU3caVSJse-0hM-IINiZ-m_SZ_SCKtqY7Z5Ff2zei7WlRN9hbCVDsGhAuTXPR03bCAB-EcVo67wPRxp0KdssUQh2jtyYaC5YQ-aAgKbk7yY"/></a></div> <p>Here I used an SSIS package variable @[User::ADLSPath] so I could easily switch out /datalake, datalake, datalake/raw, datalake/Raw/, /datalake/raw/, and evaluate case sensitivity, path manipulation, etc. <h2>F5</h2><cite>Transfer data error : System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.Azure.Storage.Common, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified. File name: 'Microsoft.Azure.Storage.Common, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35'</cite><p>I tried it again, same issue. I flipped from 64 to 32 bit (I had installed both sets of the Feature Pack). I reread the install requirements document. I looked through github bugs. I contemplated asking a question on StackOverflow. I ended up trying to reproduce the error on my desktop instead of the client's VM. Same bloody issue! Finally, I said to heck with it, maybe it more strongly worded, and I'll get this assembly and install to the GAC. Maybe something went wonky with the installs on both machines. <h2>How do I install something into the global assembly cache? </h2> <p><a href="https://stackoverflow.com/a/65633015/181965" target="_blank">StackOverflow answer</a> On your file system, you might have a utility called "gacutil.exe" From a administrative command prompt, I'd type "CD \" and then "dir /s /b gacutil.exe" That should provide a list of all the instances of gacutil on your machine. Pick one, I don't care which. If there's a space in the path name, then you'll need to wrap it with double quotes. <code>"C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.8 Tools\x64\gacutil.exe" /?</code> That would bring up the help text for using the gacutil that lives at that path. If the double quotes were not there, you'd get an error message stating <br><cite>'C:\Program' is not recognized as an internal or external command, operable program or batch file.</cite> <h2>How do I get Microsoft.Azure.Storage.Common.dll?</h2><p>I hope you comment below and tell me an easier way because I am not a nuget master. Visual Studio has a nuget package manager in it. However, it only works in the contet of a solution or project <strong>and</strong> the project type must support packages. If you have a single project in your solution and that project is an SSIS project, attempting to use the nuget package manager will result in an error <br> <cite>Operation Failed. No projects supported by NuGet in the solution.</cite> Well fiddlesticks, I guess we have to give up. <p>Or, add a script task to the SSIS package and then click Edit Script. In the new instance of Visual Studio, guess what - it thinks it supports NuGet. It does not, when you close the editor, the packages go away and when the script runs, it does not have the brains to get the assemblies back from NuGet land. So, don't.close.the.editor. yet. In the NuGet Package manager, on the Browse tab, type in Microsoft.Azure.Storage.Common and look at that - Deprecated. Last stable version 11.2.3. But this error indicates the component expects so in the Version, scroll all the way back and find it. Click the check box and click Install button. Yes, you agree to the other packages as well as the MIT license. <p>At this point, in the volatile/temporary file storage on your computer, you have an on disk representation of your empty script Task with a NuGet package reference. We need to find that location, e.g. C:\Users\bfellows\AppData\Local\Temp\Vsta\SSIS_ST150\VstacIlBNvWB__0KemyB8zd1UMw\ Copy the desired DLLs out into a safe spot (because if I have to do it here, I'll likely have to do it on the server and the other three development VMs) and then use the gacutil to install them. <p>Right click on Solution VstaProjects and choose Open in Terminal. The assembly we're looking for will be located at .\packages\Azure.Storage.Common.12.9.0\lib\netstandard2.0\Azure.Storage.Common.dll. Assume I copied it to C:\temp <p><code>"C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.8 Tools\x64\gacutil.exe" -if C:\temp\Azure.Storage.Common.dll</code> will force install the dll to the GAC. Now when I run SSIS, we'll see whether that has resolved our error. <h2>Will the real error please stand up</h2><p>It did resolve our error, but not. The error that was reported, missing assembly was <a href="https://www.youtube.com/watch?v=zD8J7Y-8FTE" target="_blank">Mickey Mouse, mate. Spurious. Not genuine. </a> <p> <iframe width="560" height="315" src="https://www.youtube.com/embed/zD8J7Y-8FTE?start=17" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> <p>Look what error decided to show up now that it could error out "better" <br><cite>Transfer data error : Microsoft.DataTransfer.Common.Shared.HybridDeliveryException: ADLS Gen2 operation failed for: Operation returned an invalid status code 'BadRequest'. Account: 'datalakedev'. FileSystem: 'datalake'. ErrorCode: 'TlsVersionNotPermitted'. Message: 'The TLS version of the connection is not permitted on this storage account.'</cite> <p>If I go to my storage account, under Settings, Configuration, there I can change Minimum TLS version from 1.2 to 1.1. Oh, except that still isn't kosher - same error. 1.0 it is and lo and behold, I have data transfer. The root cause is not a missing assembly, it is a red herring error message that could only be resolved by adding the assembly to the global assembly cache. <h2>Rant</h2><p>How in the hell would a normal person make the connection between "Could not load file or assembly" and Oh, I need to change the TLS? What's really galling is the fact that when I used the Flexible File Source for my data flow, I specified a file on blob storage and SSIS was able to connect and read that file because it identified the associated metadata. I has two columns and here are the data types (defaulted to string but who cares, that's consistent with flat file source). BUT IT PICKED UP THE METADATA. IT COULD TALK TO AZURE BLOB STORAGE EVEN THOUGH IT ONLY ALLOWED 1.2! And yet, when it came time to run, it could not talk on the same channel. Can you see how I lost a large portion of my day trying to decipher this foolishness? <p>By the way, knowing that the root cause it a mismatch between the TLS SSIS/my computer is using and the default on the storage account, let's go back to the writeup for the <a href="https://docs.microsoft.com/en-us/sql/integration-services/azure-feature-pack-for-integration-services-ssis?view=sql-server-ver15#use-tls-12" target="_blank">Azure Feature Pack</a><br> <p><cite>Use TLS 1.2 The TLS version used by Azure Feature Pack follows system .NET Framework settings. To use TLS 1.2, add a REG_DWORD value named SchUseStrongCrypto with data 1 under the following two registry keys. HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\.NETFramework\v4.0.30319 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v4.0.30319 </cite> <p>So, sometimes an error message isn't the real error message but you have to clear away all the garbage between you and the source of the error to figure out what it really is. <p/> Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-49486203643211659672021-12-22T12:00:00.189-06:002021-12-22T12:00:00.162-06:00Extracting queries from SSIS packages<h1>Extracting queries from SSIS packages</h1><p>We received an abomination of an <a href="https://twitter.com/billinkc/status/1471560571648630793" target="_blank">SSIS package from a third party</a>. It was a way to write a package that I don't think I would have suggested. <div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjB9-oqecnx8uBENdVsprdCS0f87FvZudWbWAoxa4bJ79kvujGFOhSytniXiK-w-PIvEN-T7ACkadur0GL3IdW3h6icTFl1LumReeFqf547ZIIZuemoI_ppFokmUejhqNpfyvh4-k6xjH7jtUcljX8OAda0JNfKFy4ViODkEhmFo39X3tfaDDlCdBG6" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="248" data-original-width="3251" src="https://blogger.googleusercontent.com/img/a/AVvXsEjB9-oqecnx8uBENdVsprdCS0f87FvZudWbWAoxa4bJ79kvujGFOhSytniXiK-w-PIvEN-T7ACkadur0GL3IdW3h6icTFl1LumReeFqf547ZIIZuemoI_ppFokmUejhqNpfyvh4-k6xjH7jtUcljX8OAda0JNfKFy4ViODkEhmFo39X3tfaDDlCdBG6"/></a></div> <p>Our job was to rewrite this into something more manageable and it appears Azure Data Factory will be the winner. Before we can do that, we need to document what the existing package is doing (the vendor has supplied the incremental load logic) so we can replicate it but in a more economical form. It appears to have the pattern (squint really hard at the picture) Execute SQL Task -> Execute SQL Task -> Sequence container => many data flows -> Data Flow -> Execute SQL Task. The Data Flow Task is named after the table being loaded. An ODBC source with a expression based query, named "ODBC Source 1" wired to an OLE DB Destination, named "OLE DB Destination". How would you do it, especially given that there are 236 Data Flow Tasks embedded in a single container? <h2>Biml it!</h2><p>As with so many things SSIS-related, <a href="https://bimlscript.com/" target="_blank">Biml</a> is the answer. Install <a href="https://www.varigence.com/BimlExpress" target="_blank">BimlExpress</a> and reverse engineer that dtsx package into Biml. I'll add a blank BimlScript file that I called Inspector.biml <p>Let's look at a sample DataFlow task <pre><br /> &lt;Dataflow Name="ATableName"&gt;<br /> &lt;Expressions&gt;<br /> &lt;Expression ExternalProperty="[ODBC Source 1].[SqlCommand]"&gt;"SELECT * FROM dbo.ATableName where modifiedutc &gt; '" +(DT_WSTR, 30)@[User::LastModified] + "' AND modifiedutc &lt;= '" + (DT_WSTR, 30)@[User::MostRecent] + "'"&lt;/Expression&gt;<br /> &lt;/Expressions&gt;<br /> &lt;Transformations&gt;<br /> &lt;OdbcSource Name="ODBC Source 1" Connection="Source2"&gt;<br /> &lt;DirectInput&gt;SELECT * FROM dbo.ATableName where modifiedutc &gt; '0' AND modifiedutc &lt;= '0'&lt;/DirectInput&gt;<br /> &lt;/OdbcSource&gt;<br /> &lt;DataConversion Name="Data Conversion"&gt;<br /> &lt;Columns&gt;<br /> &lt;Column SourceColumn="id" TargetColumn="Copy of id" DataType="AnsiString" Length="255" CodePage="1252" /&gt;<br /> &lt;/Columns&gt;<br /> &lt;/DataConversion&gt;<br /> &lt;OleDbCommand Name="Delete Old Rows from ATableName" ConnectionName="Destination2"&gt;<br /> &lt;Parameters&gt;<br /> &lt;Parameter SourceColumn="ye_id" TargetColumn="Param_0" DataType="AnsiStringFixedLength" Length="255" CodePage="1252" /&gt;<br /> &lt;Parameter SourceColumn="modifiedutc" TargetColumn="Param_1" DataType="Int64" /&gt;<br /> &lt;/Parameters&gt;<br /> &lt;DirectInput&gt;delete from dbo.ATableName where ye_id = ? and modifiedutc &lt; ?&lt;/DirectInput&gt;<br /> &lt;/OleDbCommand&gt;<br /> &lt;ConditionalSplit Name="Conditional Split"&gt;<br /> &lt;OutputPaths&gt;<br /> &lt;OutputPath Name="Case 1"&gt;<br /> &lt;Expression&gt;ISNULL(ye_id)&lt;/Expression&gt;<br /> &lt;/OutputPath&gt;<br /> &lt;/OutputPaths&gt;<br /> &lt;/ConditionalSplit&gt;<br /> &lt;OleDbDestination Name="OLE DB Destination" ConnectionName="Destination2"&gt;<br /> &lt;ExternalTableOutput Table="&quot;dbob&quot;.&quot;ATableName&quot;" /&gt;<br /> &lt;/OleDbDestination&gt;<br /> &lt;/Transformations&gt;<br /> &lt;/Dataflow&gt;<br /></pre> <p>All I want to do is find all the Data Flow Tasks in the sequence containers. I need to generate a key value pair of TableName and the source query. I could dive into the Transformations layer and find the ODBC source and extract the DirectInput node from the OdbcSource and then parse the table name from the OleDbDestination's ExternalTableOutput but look, I can "cheat" here. Everything I need is at the outer Data Flow Task level. The Name of the DataFlow is my table name and since it's ODBC and the source component doesn't support a direct Expression on it, it's defined at the Data Flow Level. That makes this Biml easy. <pre> <br />&lt;#<br /><br />// A dictionary of TableName and the Expression<br />Dictionary&lt;string, string&gt; incremental = new Dictionary&lt;string, string&gt;();<br /><br />foreach (AstPackageNode p in this.RootNode.Packages)<br />{<br /> // Loop through the Sequence Container<br /> foreach (var c in p.Tasks.OfType&lt;AstContainerTaskNode&gt;()/**/)<br /> {<br /> foreach (AstDataflowTaskNode t in c.Tasks.OfType&lt;AstDataflowTaskNode&gt;())<br /> {<br /> if (p.Name == "Postgres_to_MSSQL_Incremental")<br /> {<br /> incremental[t.Name] = t.Expressions[0].Expression;<br /> }<br /> }<br /> }<br />}<br /><br />WriteLine("&lt;!--");<br />foreach (KeyValuePair&lt;string, string&gt; k in incremental)<br />{<br /> // WriteLine("&lt;!-- {0}: {1} --&gt;", k.Key, k.Value);<br /> WriteLine("{0}|{1}", k.Key, k.Value.Replace("\n", " "));<br />}<br />WriteLine("--&gt;");<br /><br />#&gt;<br />&lt;Biml xmlns="http://schemas.varigence.com/biml.xsd"&gt;<br />&lt;/Biml&gt;<br /> </pre> <p>I define a dictionary that will hold the Table and the associated Expression. The first foreach loop specifies that I want to enumerate through all the packages that have been reverse engineered. If you're using BimlExpress, you'll need to shift click on all the source Biml files as well as the Inspector package. <p>The next foreach enumerator looks at all the elements in the Control Flow task and we're going to filter it to just things that are of type AstContainerTaskNode, aka a Container. That "OfType" filter syntax is very handy to focus on only the type of item you're looking for. Linq is so powerful, I love it. <p>The innermost foreach enumerator uses the OfType again to filter tasks to only those that are DataFlows, AstDataFlowTaskNode. The if statement ensures I'm only working on the Incremental package (they supplied an initial load as well). Finally, I add the Task's Name to the key of the dictionary and the value becomes the first Expression. Again, I can cheat here because the vendor package was very consistent, which is amazing for an SSIS package that's been hand crafted. That source package was 40.5 MB and had been saved 422 times according to the VersionBuild number. Kudos to them for quality control. <p>Once the looping is complete, all that is left is to emit the information so I can use it elsewhere. Thus the final foreach loop. I'm working in BimlStudio so comments are emitted and I'm going to take advantage of that by simply writing the key/value pair with a Pipe delimiter and then copy/paste the output into a CSV. If you're working in BimlExpress, I'd just write directly to a file with a System.IO.Text.WriteAllLines (name approximate) but this was just a "quick and dirty get it done" task and corresponding blog post to show that Biml and metadata programming are still relevant. <p>Eagle eyed viewers will note that I am missing the single DataFlow task after the Container. My partner also notice it and so if you need to also look for any data flow tasks at the Package level, I added this loop after the Seqence Container loop. <pre><br /> foreach (AstDataflowTaskNode t in p.Tasks.OfType&lt;AstDataflowTaskNode&gt;())<br /> {<br /> if (p.Name == "Postgres_to_MSSQL_Incremental")<br /> {<br /> incremental[t.Name] = t.Expressions[0].Expression;<br /> }<br /> }<br /></pre> <h1>Wrapup</h1>I'm a slow/infrequent blogger and this post took me 50 minutes. I think after we were done scratching our heads at the source packages, it took less time to write the script and generate the list of tables and associated queries than this blog post took. <p/>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-36714461503291331722021-11-17T08:00:00.003-06:002021-11-17T08:00:00.166-06:00ETL pattern for API source<h1>ETL pattern for API source</h1><p>The direction for software as a service providers is to provide APIs to access their data instead of structured file exports. Which is a pity, as every SaaS system requires a bespoke data extract solution. I inheireted a solution that had an adverse pattern I'd like to talk about. <br/><div class="separator" style="clear: both;"><a href="https://1.bp.blogspot.com/-jn0S5twtRg0/YZRH_KEZWUI/AAAAAAAAslo/cb2q62kXu8UQEkXJN33hgb3ceU3ukZ38gCLcBGAsYHQ/s0/API_ETL-Original.drawio.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="221" data-original-width="571" src="https://1.bp.blogspot.com/-jn0S5twtRg0/YZRH_KEZWUI/AAAAAAAAslo/cb2q62kXu8UQEkXJN33hgb3ceU3ukZ38gCLcBGAsYHQ/s0/API_ETL-Original.drawio.png"/></a></div> The solution pulls data from advertising and social media sites (Google Analytics, Twitter, Facebook, etc) and does processing to make it ready for reporting. The approach here works, but there are some challenges that you can run into. <ul><li>Metering - Providers generally restrict you to only consuming so much over time (where so much and time are highly dependent on the source). Google Analytics, depending on product, rejects your connections after so many calls. Twitter, also depending on their maddening, inconsistent set of APIs (v1 vs v2), endpoints, product (free standard, paid for premium or enterprise) will throttle you based on consumption <li>Data availability - you have no idea whether the data you pulled today will be available tomorrow. We had pulled 5 years of data out of Google Analytics that contained a variety of dimensions, two of which were ga:userAgeBracket and ga:userGender. In talking to our client, they wanted just one more data elemented added to the mix. We made the change and boom goes the dynamite: Some data in this report may have been removed when a threshold was applied. That error message means that you're requesting a level of granularity that could de-anonymize users. Ok, fine, we rolled back the change but No, that's no longer a valid combination, ever! And we ran into a situation were some of the data just wasn't availble pre-2020. Yes, a month earlier the same code had pulled 6 years worth of data but no more. <li>Oops - Something happened when we created the data for reporting (data merge introduced duplicates, client wanted a differen format, etc) and now we need to do it again, except instead of the month allocated, we have a week to fix all this up. Which bumps into the Metering and Data Availability points. Ouch town, population you. </ul> <h2>Preferred pattern</h2><p>What I inheireted wasn't bad, it just hadn't taken those possible pain points into consideration. In a classic data warehouse, you have a raw zone with immutable source sitting somewhere on cheap storage. The same lesson applies here. When you pull from an API, land that data to disk in some self defining format, json/xml/csv don't care. <div class="separator" style="clear: both;"><a href="https://1.bp.blogspot.com/-8DOBn7FyVA8/YZRH_OluO5I/AAAAAAAAslk/Uw8yUbMaKTQQ2Z8n6ti9yD9wtAZ5Z03tACLcBGAsYHQ/s0/API_ETL-Revised.drawio.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="221" data-original-width="571" src="https://1.bp.blogspot.com/-8DOBn7FyVA8/YZRH_OluO5I/AAAAAAAAslk/Uw8yUbMaKTQQ2Z8n6ti9yD9wtAZ5Z03tACLcBGAsYHQ/s0/API_ETL-Revised.drawio.png"/></a></div> <p>Write your process so that it is able to consume that source file and get the exact same results as the source data pull. <pre><br /> <br />def get_data(source_date):<br /> """Get a data for a given date.<br /> :param source_date: An ISO 8601 formatted date aka yyy-MM-dd<br /> :return: A dictionary of data<br /> <br /> """<br /> source_file = '/dbfs/mnt/datalake/raw/entity/data_provider_{0}.json'.format(source_date)<br /> raw_data = {}<br /> if os.path.exists(source_file):<br /> with open(source_file, 'r', encoding='utf-8') as f:<br /> raw_data = json.load(f)<br /> else:<br /> raw_data = analytics.reports().batchGet(body='json-goes-here').execute()<br /> with open(google_file, 'w', encoding='utf-8') as f:<br /> json.dump(raw_data, f, ensure_ascii=False)<br /> <br /> return raw_data<br /></pre> <p>This simple method is responsible for getting my data by date. If the file exists in my file system, then I will reuse the results of a previous run to satisfy the data request. Otherwise, we make a call to the API and before we finish, we write the results to disk so that we can be ready in case Ooops happens downstream of the call. <p>Using this approach, we were able to reprocess a years worth of cached data in about 10 minutes compared to about 4.5 hours of data trickling out of the source API. Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-21181867268546301002021-02-09T08:00:00.020-06:002021-02-09T08:00:08.703-06:00Including a local python module<h1>Including a local python module</h1><p>As we saw in <a href="http://billfellows.blogspot.com/2021/02/reusing-your-python-code.html" target="_blank">reusing your python code</a>, you can create a python file, a module, that contains our core business logic and then re-use that file. This post is going to talk about to make that happen. <p>What happens when you <cite>import antigravity</cite>? The python interpreter is going to check all the places it knows to <a href="https://docs.python.org/3/using/windows.html#finding-modules">find modules</a>. It's going to check the install location for a module, it's going to check if you defined pythonhome/pythonpath environment variables, and you can hint to your hearts desire where to find files. If I import a real module, <cite>import pprint</cite>, I can access the __file__ property which will tell you where it found the module. In my case it was C:\Python38\lib\pprint.py <p>Python is happy to tell you what the current search path is <code>import sys print(sys.path) </code> <p>On my machine, that displays an array of <cite>['', 'C:\\Python38\\python38.zip', 'C:\\Python38\\DLLs', 'C:\\Python38\\lib', 'C:\\Python38', 'C:\\Python38\\lib\\site-packages']</cite> If I want reusable_code.py to be callable by another module, then it needs to exist in one of those locations. That first entry of a blank path is the current directory so as long as the module I need is in the same folder, we're golden! I have added i_use_resuable.py to the same folder as the above <pre><br /># This module lives in the same folder as our reusable_code.py file<br />import reusable_code<br /><br /><br />def main():<br /> c = reusable_code.Configuration()<br /> print(c.get_modify_date())<br /><br /><br />if __name__ == '__main__':<br /> main()<br /></pre> <p>Executing that, I get the expected timestamp - the exact same experience as running our corporate file but now I can focus on using the business logic instead of writing it. We're going to need something more though if we're going to get this reuable code into our pyspark cluster. <p>In the next post, we'll learn how to package our business module up into something we can install instead of just assuming where the file is. <p>As always, the code is on my <a href="https://github.com/billinkc/blog_posts/tree/main/2021-02-09_IncludingALocalPythonModule" target="_blank">github repro</a>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-4289755856964969452021-02-08T08:00:00.080-06:002021-02-09T09:19:54.444-06:00Reusing your python code<h1>Reusing your ptyhon code</h1><p>I learned python in 2003 and used it for all the ETL work I was doing. It was beautiful and I would happilly wax to any programmer friends about the language and how they should be learning it. It turns out, my advocacy was just 15+ years too early. I recently had a client reach out to engage me to work on their Databricks project. No gentle reader, I don't much of anything about Databricks. But I do know about working with data, python programming (which I was already updating my mental model to 3.0) and pandas. Yes, pandas is not what we do in databricks but the concepts are similar. <p>One of the early observations is that they had dozens of notebooks with copy and paste code across them. Copy and paste code in a metadata driven solution isn't an evil but when you're hand crafting boiler plate code artifacts by hand, you're going to sneak a code mutation in there. So, let's look at how we can avoid this with code re-use. <p>Let's assume we use an important business process that needs to be consistent across our infrastructure. In this case, it's a modification date which is used as part of our partition strategy. This code nugget is spread across all those notebooks <code>datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")</code> When processing starts up, we set a timestamp so that all activities accrue under that same timestamp. It's a common pattern across data processing. How could we do this better? <p>In classic python programming, we would abstract that logic away in a reusable library. In this example, I have created a module (file) named <cite>reusable_code.py</cite> In it, I created a class named <cite>Configuration</cite> and it exposes a method <cite>get_modify_date</cite> <pre><br /># reusable_code.py is a python module that simulates our desire to <br />#consolidate our corporate business logic into a re-usable entity<br /><br />from datetime import datetime<br /><br />class Configuration():<br /> """A very important class that provides a standardized approach for our company"""<br /> def __init__(self):<br /> # The modify date drives very important business functionality<br /> # so let's be consitent in how it is defined (ISO 8601)<br /> # 2021-02-07T17:58:20Z<br /> self.__modify_date__ = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")<br /><br /> def get_modify_date(self):<br /> """Provide a standard interface for accessing the modify date"""<br /> return self.__modify_date__<br /><br /><br />def main():<br /> c = Configuration()<br /> print(c.get_modify_date())<br /><br />if __name__ == "__main__":<br /> main()<br /></pre> <p>Usage is simple, I create an instance of my class <cite>c</cite>, which causes the constructor/initalizer to fire and set the modify date for the life of that object. Calling the get_modify_date method results in an ISO 8601 date to be emitted <blockquote>2021-02-07T17:58:20Z</blockquote> <p>At this point, I hope you have an understanding of how we can make a reusable widget. Think about your business processes that you need to encapsulate in to reusable components and tomorrow we'll review <a href="http://billfellows.blogspot.com/2021/02/including-local-python-module.html">using existing python modules in new files</a>. After that, we'll cover converting this module into a wheel. And then we'll walk through installing it to a DataBricks cluster and using it from a notebook. Sound good? <p>All of this code is available on my github repository -> <a href="https://github.com/billinkc/blog_posts/tree/main/2021-02-08_PythonReusableCode" target="_blank">2021-02-08_PythonReusableCode</a>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-87418708256279603182020-02-20T12:47:00.001-06:002020-02-20T12:47:07.729-06:00Making a delimited list<h1>Making a delimited list</h1><p>There are various ways to concatenate values together. A common approach I see is that people will add a delimiter and then the value and loop until they finish. Then they take away the first delimiter. Generally, that's easier coding, prepending a delimiter, than to append the delimiter and not do it for the final element of a list. Or add it and then remove the final delimiter from the resulting string. <p>Gentle reader, there is a better way. And has been for quite some time but if you weren't looking for it, you might not know it exists. In .NET, it's <a href="https://docs.microsoft.com/en-us/dotnet/api/system.string.join?view=netframework-4.8" target="_blank">String.Join</a> <p>Look at the following code, what would you rather write? The Avoid this block or simply use the libraries? <pre><br />using System;<br />using System.Collections.Generic;<br />using System.Linq;<br />using System.Xml.Linq;<br />using System.Text;<br /> <br />public class Program<br />{<br /> public static void Main()<br /> {<br /> // generate a list of numbers<br /> List<int> data = (Enumerable.Range(0, 10)).ToList<int>();<br /><br /> string delimiter = ",";<br /> // Avoid this, unless you know you need to do it for a specific reason<br /> {<br /> StringBuilder sb = new StringBuilder();<br /> foreach(var i in data)<br /> {<br /> sb.Append(delimiter);<br /> sb.Append(i);<br /> }<br /> <br /> // Convert the string builder to a string and then strip the first<br /> // character out of it<br /> string final = sb.ToString().Substring(delimiter.Length);<br /> <br /> Console.WriteLine(final);<br /> }<br /> <br /> // Make a comma delimited list<br /> Console.WriteLine(String.Join(delimiter, data));<br /> <br /> // What if we want to do something, like put each element in an XML tag?<br /> Console.WriteLine(String.Join(string.Empty, data.Select(x => string.Format("<col>{0}</col>", x)).ToList()));<br /><br /> }<br />}<br /></pre> Output of running the above <pre><br />0,1,2,3,4,5,6,7,8,9<br />0,1,2,3,4,5,6,7,8,9<br /><col>0</col><col>1</col><col>2</col><col>3</col><col>4</col><col>5</col><col>6</col><col>7</col><col>8</col><col>9</col><br /></pre> <a href="https://gist.github.com/billinkc/3678a94244d8a7f86bc567aed33e0aba" target="_blank">Gist for .net</a> <p><cite>But Bill, I use Python.</cite> The syntax changes but the concept remains the same. <code>delimiter.join(data)</code>. Whatever language you use, there's probably an equivalent method. Look for it and use it. Don't write your own implementation. <p>Was there a better way to have done this? Let me know. Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-66369475537395649952019-11-19T14:00:00.001-06:002019-11-19T14:00:53.767-06:00Generating characters in TSQL<h1>Generating characters in TSQL</h1> <p>I had to do a thing<sup>*</sup> and it involved generating "codes" as numbers were too hard for people. So, if you have need to convert an arbitrary number into characters, this is your lucky day/post. <h3>Background</h3><p>As <strike>I get longer in the tooth</strike> programming becomes more accessible, I find that people might not have been exposed to underpinnings of how things used to work. Strings were just a bunch of characters put together and a character was a subset of the Latin alphabet shoved into 128 characters (0 to 127). The characters below 32 were referred to as the non-printable characters or control characters. Things above 32 are what you see on a US keyboard. There was a time, if you bought a programming book, it would have an ASCII table somewhere in the reference. Capital A is character 65, Capital Z is character 90 (65/A + 25 characters later). In TSQL, the CHAR function takes a number and gives you the ASCII character for the value so <code>SELECT CHAR(66) AS B;</code> will generate a capital B. <p>The mod or modulus function will return the remainder after division. Modding a value is a handy way to constrain a value between 0 and an upper threshold. In this case, if I modded any number by 26 (because there are 26 characters in the English alphabet), I'll get 0 to 25 as my result. <p>Knowing that the modulus function will give me 0 to 25 and knowing that my target character range starts at 65, I could use the previous expression to print any number's ascii value like <code>SELECT CHAR((2147483625 % 26) + 65) AS StillB;</code>. Break that apart, we do the modulus, <code>%</code>, which gives us the value of 1 which we then add to the starting offset (65). <p>Rolling all that together, here's <a href="https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=2708993b617862bc3505550bcffcb8d9" target="_blank">a quick little tester</a> to see what we can then do with it. <pre><br />SELECT<br /> D.rn<br />, ASCII_ORD.ord_value<br />, ASCII_ORD.replicate_count<br /> -- CHAR converts a number to a character<br />, CHAR(ASCII_ORD.ord_value) AS ord_value_as_character<br /> -- REPLICATE repeats a string N times<br />, REPLICATE(CHAR(ASCII_ORD.ord_value), ASCII_ORD.replicate_count) AS RepeatedCharacter<br /> -- CONCAT is a null and type approach for string building (requires 2012+)<br />, CONCAT(CHAR(ASCII_ORD.ord_value), ASCII_ORD.replicate_count) AS ConcatenatedCharacter<br />FROM<br />(<br /> -- Generate 0 to N-1 rows<br /> SELECT TOP (300)<br /> ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1<br /> FROM<br /> sys.all_columns AS AC<br />)D(rn)<br />CROSS APPLY<br />(<br /> -- There are 26 characters in the English language<br /> -- 65 is the ASCII ordinal position of a capital A<br /> SELECT<br /> D.rn % 26 + 65<br /> , D.rn / 26 + 1<br />) ASCII_ORD(ord_value, replicate_count)<br />ORDER BY<br /> D.rn<br />;<br /></pre> <p>Ultimately, it was decided that using a combination of character and digits (ConcatenatedCharacter) might be more user friendly than purely a repeated character approach. Neither of which will help you when you're in the 2 billion range like our sample input of 2147483625 <h3>Key takeaways</h3><p>Don't confuse the CHAR function with the char data type. Similar but different <p><em>That's</em> why books always had ASCII tables in them <p>Modulus function can generate a bounded set of numbers <p>Older developers might know some weird tricks/trivia <p>Even older developers will scoff at memorized ASCII tables in favor of EBCDIC tables <p>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-9519016460340851312019-09-05T10:41:00.000-05:002019-09-05T10:45:04.121-05:00Using Newtonsoft.Json with Biml<h1>Using Newtonsoft.Json with Biml</h1><p><a href="https://twitter.com/metx/status/1169603770562621441" target="_blank">Twitter</a> provided an opportunity for a quick blog post <p><cite>#sqlhelp #biml I would have the metadata in a Json structure. How would you parse the json in the C# BIML Script? I was thinking use Newtonsoft.Json but I don't know how to add the reference to it </cite> <p><a href="http://billfellows.blogspot.com/2015/05/adding-external-assemblies.html">Adding external assemblies</a> is a snap but here I'll show how to use the <a href="https://www.newtonsoft.com/json">NewtonSoft Json</a> library to parse a Json based metadata structure and then use that in our Biml. <p><pre><br />&lt;Biml xmlns=&quot;http://schemas.varigence.com/biml.xsd&quot;&gt;<br />&lt;#<br />// Given the following structure<br /><br />///{<br />/// "packages": [<br />/// "p1",<br />/// "p2",<br />/// "p3"<br />/// ]<br />///}<br /><br />// Assume the json file is located as specified<br />string sourceFile = @"C:\ssisdata\trivial.json";<br /><br />// Read the data into a string variable<br />string json = System.IO.File.ReadAllText(sourceFile);<br /> <br />// Deserialize the json into a dictionary of strings (packages) and a list of strings (p1, p2, p3)<br />Dictionary&lt;string, List&lt;string>> metadata = JsonConvert.DeserializeObject&lt;Dictionary&lt;string, List&lt;string>>>(json);<br />#><br />&lt;Packages><br />&lt;#<br />// Shred the dictionary for our values<br />foreach (string item in metadata["packages"])<br />{<br /> //WriteLine(String.Format("&lt;!-- {0} -->", item));<br />#><br /> &lt;Package Name="&lt;#=item #>" /><br />&lt;#<br />}<br />#> <br />&lt;/Packages><br />&lt;/Biml><br /><br />&lt;#@ import namespace="Newtonsoft.Json" #><br />&lt;#* Assuming we have GAC'ed the assembly *#><br />&lt;#@ assembly name= "Newtonsoft.Json.dll" #><br /></pre> <p>The <a href="https://gist.github.com/billinkc/39e0711036918c805457d54630583167">gist</a> is also posted in case I mangled the hand crafted html entities above. <p>Also, not covered is GAC'ing the assembly but you can use an explicit path to your DLL <cite>name="C:\where\did\I\put\this\Newtonsoft.Json.dll"</cite>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-15565712118611022502019-02-27T08:00:00.000-06:002019-02-27T08:03:48.000-06:00Biml in Azure aka CallBimlScriptContent <a href="https://varigence.com/Documentation/Mist/Article/BimlStudio+2017" target="_blank">CallBimlScriptContent</a> was introduced with the migration from Mist to BimlStudio. Why is this cool? You do not have to use files sitting on your computer as the source for your Biml. As long as you can reconstitute the Biml contents into a string, you can store your scripts where ever you'd like. If you want them in a database, that's great. Store them in the cloud? Knock yourself out. <p>As a consultant, the latter is rather compelling. Maybe I'm only licensing my clients to use accelerators during our engagement. If I leave files on the file system after I roll off, or they image my computer and accidentally collect them, I am David fighting Goliath. CallBimlScriptContent is a means to protect myself and my IP. Let's look at a trivial example. I set a C# string with an empty Package tag (hooray for doubling up my double quotes). Within my Packages collection, I invoke CallBimlScriptContent passing in my Biml content. <pre class="csharpcode"><br /><span class="kwrd">&lt;</span><span class="html">Biml</span> <span class="attr">xmlns</span><span class="kwrd">="http://schemas.varigence.com/biml.xsd"</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#<br />// Do something here to populate myBimlFile<br />string myBimlFile = @"<span class="kwrd">&lt;</span><span class="html">Package</span> <span class="attr">Name</span><span class="kwrd">=""</span><span class="attr">ABC</span><span class="kwrd">""</span> <span class="kwrd">/&gt;</span>";<br />#<span class="kwrd">&gt;</span> <br /> <span class="kwrd">&lt;</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span>#=CallBimlScriptContent(myBimlFile)#<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;/</span><span class="html">Biml</span><span class="kwrd">&gt;</span></pre> The rendered Biml for the above would look like <pre class="csharpcode"><br /><span class="kwrd">&lt;</span><span class="html">Biml</span> <span class="attr">xmlns</span><span class="kwrd">="http://schemas.varigence.com/biml.xsd"</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Package</span> <span class="attr">Name</span><span class="kwrd">="ABC"</span> <span class="kwrd">/&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;/</span><span class="html">Biml</span><span class="kwrd">&gt;</span></pre> <p>It's super that it works, but that's not convenient. Like, at all! Plus, good luck trying to embed any complexity in that string. <p>So, let's try something a little more complex. Conceptually, imagine we have two Biml Scripts we might choose to call inc_Package_00.biml and inc_Package_10.biml <#@ property name="parameterName" type="string" #> <h3>inc_Package_00.biml</h3><pre class="csharpcode"><br /><span class="kwrd">&lt;</span><span class="html">Package</span> <span class="attr">Name</span><span class="kwrd">="ABC"</span> <span class="kwrd">/&gt;</span></pre> <h3>inc_Package_10.biml</h3><pre class="csharpcode"><br /><span class="kwrd">&lt;</span>#@ property name="packageName" type="string" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span><span class="html">Package</span> <span class="attr">Name</span><span class="kwrd">="packageName"</span> <span class="kwrd">/&gt;</span></pre> Our original code could then look like <pre class="csharpcode"><br /><span class="kwrd">&lt;</span><span class="html">Biml</span> <span class="attr">xmlns</span><span class="kwrd">="http://schemas.varigence.com/biml.xsd"</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#<br />// Do something here to populate myBimlFile<br />string myBimlFile =System.IO.File.ReadAllText(@"C:\tmp\inc_Package_00.biml");<br />#<span class="kwrd">&gt;</span> <br /> <span class="kwrd">&lt;</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span>#=CallBimlScriptContent(myBimlFile, "Package_00)"#<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;/</span><span class="html">Biml</span><span class="kwrd">&gt;</span></pre> Do you need to pass parameters? It's no different than what you're used to doing <pre class="csharpcode"><br /><span class="kwrd">&lt;</span><span class="html">Biml</span> <span class="attr">xmlns</span><span class="kwrd">="http://schemas.varigence.com/biml.xsd"</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#<br />// Do something here to populate myBimlFile<br />string myBimlFile =System.IO.File.ReadAllText(@"C:\tmp\inc_Package_10.biml");<br />#<span class="kwrd">&gt;</span> <br /> <span class="kwrd">&lt;</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span>#=CallBimlScriptContent(myBimlFile, "Package_10)"#<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;/</span><span class="html">Biml</span><span class="kwrd">&gt;</span></pre> <p>In the next post, I'll show you how to use reference data stored in tables or Azure as your BimlScript Content. Stay tuned!Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-68418615630396817602018-12-20T13:06:00.001-06:002018-12-20T13:06:25.843-06:00My github repositoryIn preparation for my talk at the <a href="http://kansascity.sqlpass.org/" target="_blank">Kansas City SQL Server User Group</a> this afternoon, I am putting this post here so people can get the materials easily. <p><a href="https://github.com/billinkc/presentations/tree/master/LightningTalks" target="_blank">Lightning Talks</a>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-18322238196717365532018-10-26T08:00:00.000-05:002018-10-26T08:00:05.163-05:00SQL Server Agent Job Sort Order<h1>SQL Server Agent Job Sort Order</h1><p>Today's post could also be titled "I have no idea what is happening here." We have an agent job, "Job - Do Stuff". We then created a few hundred jobs (templates for the win) all named like "Job - Do XYZ" where XYZ is a mainframe module identifier. When I'm scrolling through the list of jobs, it takes a few passes for my eye to find Do Stuff between DASD and DURR. I didn't want to change the leading portion of my job but I wanted <em>my</em> job to be sorted first. I open an <a href="https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html" target="_blank">ASCII table</a> and find a useful character that sorts before the dash. Ah, asterisk, ASCII 42 comes before dash, ASCII 45. <p><img border="0" src="https://4.bp.blogspot.com/-KiTXLHfqhtA/W9I7Linj5dI/AAAAAAAAdxo/UgAciipfCBEIDWD9f_HwaoCjpFxUOX79gCLcBGAs/s1600/Jobs_2.png" data-original-width="157" data-original-height="66" /><p>Well, that was unexpected. In my reproduction here, the job names will take the form of the literal string "JOB " (trailing space there). I then use a single ASCII character as separator. A use another string literal "CHAR(" and then I display the ASCII ordinal value and for completeness, I close the parenthesis. Thus, JOB * CHAR(42) and JOB - CHAR(45). Assuming I sort ascending alphabetically, which under the sheets I would convert each character to its ASCII value, would lead to me JOB * CHAR(42) on top. <p>That ain't the way it's being sorted in SSMS though. Let's figure out "is this an application issue or a database issue?" Jobs are stored in the database msdb in a table called sysjobs in the dbo schema. Let's start there. <pre class="csharpcode"><br /><span class="kwrd">SELECT</span><br /> S.name <span class="kwrd">AS</span> JobName<br /><span class="kwrd">FROM</span><br /> msdb.dbo.sysjobs <span class="kwrd">AS</span> S<br /><span class="kwrd">WHERE</span><br /> S.name <span class="kwrd">LIKE</span> <span class="str">'JOB%'</span><br /><span class="kwrd">ORDER</span> <span class="kwrd">BY</span><br /> S.name;<br /></pre> <p>Huh.<p><img border="0" src="https://2.bp.blogspot.com/-WpBDMG2UJBg/W9JBG4sGlwI/AAAAAAAAdx4/_kTO_hLmkR0nEu6FQV6yARN1yzjt518IQCLcBGAs/s1600/sysjobs.png" data-original-width="128" data-original-height="58" /><p> <p> Ok, so what goes into sorting? <a href="https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-2017" target="_blank">Collations</a> <pre class="csharpcode"><br /><br /><span class="kwrd">SELECT</span><br /> S.name <span class="kwrd">AS</span> SchemaName<br />, T.name <span class="kwrd">AS</span> TableName<br />, C.name <span class="kwrd">AS</span> ColumnName<br />, T2.name <span class="kwrd">AS</span> DataTypeName<br />, C.collation_name <span class="kwrd">AS</span> ColumnCollationName<br />, T2.collation_name <span class="kwrd">AS</span> TypeCollationName<br /><span class="kwrd">FROM</span><br /> msdb.sys.schemas <span class="kwrd">AS</span> S<br /> <span class="kwrd">INNER</span> <span class="kwrd">JOIN</span><br /> msdb.sys.tables <span class="kwrd">AS</span> T<br /> <span class="kwrd">ON</span> T.schema_id = S.schema_id<br /> <span class="kwrd">INNER</span> <span class="kwrd">JOIN</span><br /> msdb.sys.columns <span class="kwrd">AS</span> C<br /> <span class="kwrd">ON</span> C.object_id = T.object_id<br /> <span class="kwrd">INNER</span> <span class="kwrd">JOIN</span><br /> msdb.sys.types <span class="kwrd">AS</span> T2<br /> <span class="kwrd">ON</span> T2.user_type_id = C.user_type_id<br /><span class="kwrd">WHERE</span><br /> S.name = <span class="str">'dbo'</span><br /> <span class="kwrd">AND</span> T.name = <span class="str">'sysjobs'</span><br /> <span class="kwrd">AND</span> C.name = <span class="str">'name'</span>;<br /></pre> <p>The name column for dbo.sysjobs is of data type sysname which uses the collation of "SQL_Latin1_General_CP1_CI_AS". If it's the collation causing the "weird" sort, then we should be able to reproduce it, right? <pre class="csharpcode"><br /><span class="kwrd">SELECT</span> *<br /><span class="kwrd">FROM</span><br />(<br /> <span class="kwrd">VALUES</span><br /> (<span class="str">'JOB - CHAR(45)'</span> <span class="kwrd">COLLATE</span> SQL_Latin1_General_CP1_CI_AS)<br /> , (<span class="str">'JOB * CHAR(42)'</span> <span class="kwrd">COLLATE</span> SQL_Latin1_General_CP1_CI_AS)<br />) D(jobName)<br /><span class="kwrd">ORDER</span> <span class="kwrd">BY</span><br /> D.jobName <span class="kwrd">COLLATE</span> SQL_Latin1_General_CP1_CI_AS;</pre> <p>Nope, not the collation since this returns in the expected sort order. <p><img border="0" src="https://4.bp.blogspot.com/--nwiJD-UfSc/W9JIbaR3V7I/AAAAAAAAdyI/OtvzMnnsKwc1mr7jI_OAP17p87qX5Q6IgCLcBGAs/s1600/varcharSort.png" data-original-width="126" data-original-height="56" /> <p>At this point, I waste a lot time going down rabbit holes that this isn't, because in my reproduction was not verbatim. I neglected to preface my strings with an <code>N</code> thus leaving them as ascii strings, not unicode strings. <pre class="csharpcode"><br /><span class="kwrd">SELECT</span> *<br /><span class="kwrd">FROM</span><br />(<br /> <span class="kwrd">VALUES</span><br /> (N<span class="str">'JOB - CHAR(45)'</span> <span class="kwrd">COLLATE</span> SQL_Latin1_General_CP1_CI_AS)<br /> , (N<span class="str">'JOB * CHAR(42)'</span> <span class="kwrd">COLLATE</span> SQL_Latin1_General_CP1_CI_AS)<br />) D(jobName)<br /><span class="kwrd">ORDER</span> <span class="kwrd">BY</span><br /> D.jobName <span class="kwrd">COLLATE</span> SQL_Latin1_General_CP1_CI_AS;</pre> <p>Running that, we get the same sort from sysjobs. At this point, I remember something about unicode sorting being different than old school dictionary sort like I was expecting. And after finding this <a href="https://dba.stackexchange.com/a/175245/2131" target="_blank">answer on collations</a> I'm happy simply setting my quest aside and stepping away from the keyboard. <p>Oh, but if you want to see what the glorious sort order is for characters in the printable range (32 to 127), my script is below. Technically, 127 is a cheat since it's the DELETE but I include it because of where it sorts. <h3>Make the jobs</h3><p>This script has two templates in it - @MischiefManaged deletes a job and @Template creates a job. I query against sys.all_columns to get a sequential set of numbers from 1 to (127 -32). I use that number and string concatenation (requires 2012+) plus the CHAR function to translate the number into the corresponding ASCII character. It will print out "JOB ' CHAR(39)" once complete because I'm lazy. <pre class="csharpcode"><br /><span class="kwrd">DECLARE</span><br /> @Template nvarchar(<span class="kwrd">max</span>) = N<span class="str">'<br />use msdb;<br />IF EXISTS (SELECT * FROM dbo.sysjobs AS S WHERE S.name = '</span><span class="str">'&lt;JobName/&gt;'</span><span class="str">')<br />BEGIN<br /> EXECUTE dbo.sp_delete_job @job_name = '</span><span class="str">'&lt;JobName/&gt;'</span><span class="str">';<br />END<br />EXECUTE dbo.sp_add_job<br /> @job_name = N'</span><span class="str">'&lt;JobName/&gt;'</span><span class="str">'<br />, @enabled = 1<br />, @notify_level_eventlog = 0<br />, @notify_level_email = 2<br />, @notify_level_page = 2<br />, @delete_level = 0<br />, @category_name = N'</span><span class="str">'[Uncategorized (Local)]'</span><span class="str">';<br /><br />EXECUTE dbo.sp_add_jobserver<br /> @job_name = N'</span><span class="str">'&lt;JobName/&gt;'</span><span class="str">'<br />, @server_name = @@SERVERNAME;<br /><br />EXEC dbo.sp_add_jobstep<br /> @job_name = N'</span><span class="str">'&lt;JobName/&gt;'</span><span class="str">'<br />, @step_name = N'</span><span class="str">'MinimumViableJob'</span><span class="str">'<br />, @step_id = 1<br />, @cmdexec_success_code = 0<br />, @on_success_action = 2<br />, @on_fail_action = 2<br />, @retry_attempts = 0<br />, @retry_interval = 0<br />, @os_run_priority = 0<br />, @subsystem = N'</span><span class="str">'TSQL'</span><span class="str">'<br />, @command = N'</span><span class="str">'SELECT 1'</span><span class="str">'<br />, @database_name = N'</span><span class="str">'msdb'</span><span class="str">'<br />, @flags = 0;<br /><br />EXEC dbo.sp_update_job<br /> @job_name = N'</span><span class="str">'&lt;JobName/&gt;'</span><span class="str">'<br />, @start_step_id = 1;<br />'</span><br />, @MischiefManaged nvarchar(4000) = N<span class="str">'<br />use msdb;<br />IF EXISTS (SELECT * FROM dbo.sysjobs AS S WHERE S.name = '</span><span class="str">'&lt;JobName/&gt;'</span><span class="str">')<br />BEGIN<br /> EXECUTE dbo.sp_delete_job @job_name = '</span><span class="str">'&lt;JobName/&gt;'</span><span class="str">';<br />END'</span><br />, @Token sysname = <span class="str">'&lt;JobName/&gt;'</span><br />, @JobName sysname<br />, @Query nvarchar(<span class="kwrd">max</span>);<br /><br /><span class="kwrd">DECLARE</span><br /> CSR <span class="kwrd">CURSOR</span><br />FAST_FORWARD<br /><span class="kwrd">FOR</span><br /><span class="kwrd">SELECT</span><br /> J.jobName<br /><span class="kwrd">FROM</span><br />(<br /> <span class="kwrd">SELECT</span> <span class="kwrd">TOP</span> (127-31)<br /> 31 + (ROW_NUMBER() <span class="kwrd">OVER</span> (<span class="kwrd">ORDER</span> <span class="kwrd">BY</span> (<span class="kwrd">SELECT</span> <span class="kwrd">NULL</span>))) <span class="kwrd">AS</span> rn<br /> <span class="kwrd">FROM</span> sys.all_columns <span class="kwrd">AS</span> AC<br />) D(rn)<br /> <span class="kwrd">CROSS</span> APPLY<br /> (<br /> <span class="kwrd">SELECT</span><br /> CONCAT(<span class="str">'JOB '</span>, <span class="kwrd">CHAR</span>(D.rn), <span class="str">' CHAR('</span>, D.rn, <span class="str">')'</span>)<br /> )J(jobName)<br /><br /><span class="kwrd">OPEN</span> CSR;<br /><span class="kwrd">FETCH</span> <span class="kwrd">NEXT</span> <span class="kwrd">FROM</span> CSR <span class="kwrd">INTO</span> @JobName;<br /><br /><span class="kwrd">WHILE</span> <span class="preproc">@@FETCH_STATUS</span> = 0<br /><span class="kwrd">BEGIN</span><br /> <span class="kwrd">BEGIN</span> TRY <br /> <span class="kwrd">SET</span> @Query = REPLACE(@Template, @Token, @JobName);<br /> --<span class="rem">-- Uncomment the following to clean up our jobs</span><br /> --<span class="rem">SET @Query = REPLACE(@MischiefManaged, @Token, @JobName);</span><br /> <span class="kwrd">EXECUTE</span> sys.sp_executesql @Query, N<span class="str">''</span>;<br /> <span class="kwrd">END</span> TRY<br /> <span class="kwrd">BEGIN</span> CATCH<br /> <span class="kwrd">PRINT</span> @JobName;<br /> <span class="kwrd">END</span> CATCH<br /> <span class="kwrd">FETCH</span> <span class="kwrd">NEXT</span> <span class="kwrd">FROM</span> CSR <span class="kwrd">INTO</span> @JobName;<br /><span class="kwrd">END</span><br /><span class="kwrd">CLOSE</span> CSR;<br /><span class="kwrd">DEALLOCATE</span> CSR;</pre> <p>At this point, you can refresh the Jobs list in SSMS and the result is this job sort. <p><img border="0" src="https://1.bp.blogspot.com/-xbXbBvp5olQ/W9JNqK2igoI/AAAAAAAAdyY/KFzSFkDdbC0CgH02dili-8SbTYymzbpZwCLcBGAs/s1600/SortThis.png" data-original-width="167" data-original-height="1600" /> <p>Once you're satisfied with how things look, uncomment this line <code>SET @Query = REPLACE(@MischiefManaged, @Token, @JobName);</code> and rerun the script. All will be cleaned up. <p>Let's just chalk sorting up there with timezones, ok? Sounds easy but isn't. If you know more than me, please explain away in the comments section and share your knowledge. <p>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-37476961403451264262018-10-18T08:00:00.000-05:002018-10-18T08:00:08.001-05:00Polling in SQL Agent<h1>Polling in SQL Agent</h1> <p>A fun question over on StackOverflow asked about using SQL Agent with SSIS to <a href="https://stackoverflow.com/q/52861551/181965" target="_blank">poll for a file's existence.</a> As the comments indicate, there's a non-zero startup time associated with SSIS (it must validate the metadata associated to the sources and destinations), but there is a faster, lighter weight alternative. Putting together a host of TSQL ingredients, including undocumented extended stored procedures, the following recipe could be used as a SQL Agent job step. <p>If you copy and paste the following query into your favorite instance of SQL Server, it will execute for one minute and it will complete by printing the words "Naughty, naughty". <pre class="csharpcode"><br />SET NOCOUNT ON;<br /><span class="rem">-- http://www.patrickkeisler.com/2012/11/how-to-use-xpdirtree-to-list-all-files.html</span><br />DECLARE<br /> <span class="rem">-- Don't do stupid things like adding spaces into folder names</span><br /> @sourceFolder varchar(260) = <span class="str">'C:\ssisdata\Input'</span><br /> -- Have to use SQL matching rules, not DOS/SSIS<br />, @fileMask sysname = <span class="str">'SourceData%.txt'</span><br /> <span class="rem">-- how long to wait between polling</span><br />, @SleepInSeconds <span class="kwrd">int</span> = 5<br /> <span class="rem">-- Don't exceed 24 hours aka 86400 seconds</span><br />, @MaxTimerDurationInSeconds <span class="kwrd">int</span> = (3600 * 0) + (60 * 1) + 0<br /> <span class="rem">-- parameter for xp_dirtree 0 =&gt; top folder only; 1 =&gt; subfolders</span><br />, @depth <span class="kwrd">int</span> = 1<br /> <span class="rem">-- parameter for xp_dirtree 0 =&gt; directory only; 1 =&gt; directory and files</span><br />, @collectFile <span class="kwrd">int</span> = 1<br />, @RC bigint = 0;<br /><br /><span class="rem">-- Create a table variable to capture the results of our directory command</span><br />DECLARE<br /> @DirectoryTree table<br />(<br /> id <span class="kwrd">int</span> IDENTITY(1, 1)<br />, subdirectory nvarchar(512)<br />, depth <span class="kwrd">int</span><br />, isFile bit<br />);<br /><br />-- Use our sleep <span class="kwrd">in</span> seconds time to generate a delay time <span class="kwrd">string</span><br />DECLARE<br /> @delayTime <span class="kwrd">char</span>(10) = CONVERT(<span class="kwrd">char</span>(10), TIMEFROMPARTS(@SleepInSeconds/60 /60, @SleepInSeconds/60, @SleepInSeconds%60, 0, 0), 108)<br />, @stopDateTime datetime2(0) = DATEADD(SECOND, @MaxTimerDurationInSeconds, CURRENT_TIMESTAMP);<br /><br /><span class="rem">-- Force creation of the folder</span><br />EXECUTE dbo.xp_create_subdir @sourceFolder;<br /><br />-- Load the results of our directory<br />INSERT INTO<br /> @DirectoryTree<br />(<br /> subdirectory<br />, depth<br />, isFile<br />)<br />EXECUTE dbo.xp_dirtree<br /> @sourceFolder<br />, @depth<br />, @collectFile;<br /><br /><span class="rem">-- Prime the pump</span><br />SELECT<br /> @RC = COUNT_BIG(1)<br />FROM<br /> @DirectoryTree AS DT<br />WHERE<br /> DT.isFile = 1<br /> AND DT.subdirectory LIKE @fileMask;<br /><br />WHILE @rc = 0 AND @stopDateTime &gt; CURRENT_TIMESTAMP<br />BEGIN<br /><br /> <span class="rem">-- Load the results of our directory</span><br /> INSERT INTO<br /> @DirectoryTree<br /> (<br /> subdirectory<br /> , depth<br /> , isFile<br /> )<br /> EXECUTE dbo.xp_dirtree<br /> @sourceFolder<br /> , @depth<br /> , @collectFile;<br /><br /> <span class="rem">-- Test <span class="kwrd">for</span> file existence</span><br /> SELECT<br /> @RC = COUNT_BIG(1)<br /> FROM<br /> @DirectoryTree AS DT<br /> WHERE<br /> DT.isFile = 1<br /> AND DT.subdirectory LIKE @fileMask;<br /><br /> IF @RC = 0<br /> BEGIN<br /> <span class="rem">-- Put our process to sleep <span class="kwrd">for</span> a period of time</span><br /> WAITFOR DELAY @delayTime;<br /> END<br />END<br /><br /><span class="rem">-- at this point, we have either exited due to file found or time expired</span><br />IF @RC &gt; 0<br />BEGIN<br /> <span class="rem">-- Take action when file was found</span><br /> PRINT <span class="str">'Go run SSIS or something'</span>;<br />END<br />ELSE<br />BEGIN<br /> <span class="rem">-- Take action for file not delivered in expected timeframe</span><br /> PRINT <span class="str">'Naughty, naughty'</span>;<br />END<br /></pre> <p>If you rerun the above query, in a separate window, assuming you have xp_cmdshell enabled, firing the following query will create a file with the expected pattern. Instead, it'll print out "Go run SSIS or something" <pre class="csharpcode"><br />DECLARE<br /> @sourceFolder <span class="kwrd">varchar(260)</span> = <span class="str">'C:\ssisdata\Input'</span><br />, @fileMask <span class="kwrd">sysname</span> = REPLACE(<span class="str">'SourceData%.txt</span>', <span class="str">'%'</span>, <span class="kwrd">CONVERT(char(10), CURRENT_TIMESTAMP, 120))</span><br />DECLARE<br /> @command <span class="kwrd">varchar(1000)</span> = <span class="str">'echo &gt; '</span> + @sourceFolder + <span class="str">'\'</span> + @fileMask;<br /><br /><span class="rem">-- If you get this error<br />--Msg 15281, Level 16, State 1, Procedure sys.xp_cmdshell, Line 1 [Batch Start Line 0]<br />--SQL Server blocked access to procedure 'sys.xp_cmdshell' of component 'xp_cmdshell' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'xp_cmdshell' by using sp_configure. For more information about enabling 'xp_cmdshell', search for 'xp_cmdshell' in SQL Server Books Online.<br />--<br />-- Run this<br />--EXECUTE sys.sp_configure'xp_cmdshell', 1;<br />--GO<br />--RECONFIGURE;<br />--GO</span><br />EXECUTE sys.xp_cmdshell @command;<br /><br /></pre> <p>Once you're satisfied with how that works, now what? I'd likely set up a step 2 which is the actual running of the SSIS package (instead of printing a message). What about the condition that a file wasn't found? I'd likely use throw/raiserrror or just old fashioned divide by zero to force the first job step to fail. And then specify a reasonable number of @retry_attempts and @retry_interval. <p>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-72461405527104094192018-10-09T08:00:00.000-05:002019-12-30T13:56:45.696-06:00Biml Excel Data Source without Excel<h1>Biml Excel Meta Data Source without Excel</h1><p>In the previous post, <a href="https://billfellows.blogspot.com/2018/10/reading-excel-files-without-excel.html" target="_blank">Reading Excel files without Excel</a>, I showed some simple code to consume Excel without having Excel installed on your machine. How/Why would I use this - well look at the sample spreadsheet. That could be used quite nicely by a business analyst to generate SSIS packages. In fact, it <em>is</em> being used by a very savvy business analyst at one of my clients' shadow IT groups to identify the source data they'd like brought into their data mart. They are translating their mainframe data extracts into SQL equivalents and specifying where the data should land. <p>This is exciting for me as this team gets their data and knows the business problems they need to solve &emdash; they just didn't have all the tools to do so. They are supplying the data domain expertise and we are generating consistent packages that adhere to corporate standards (as well as defining the scheduling, alerting, etc). It's a good match. <p>My resources are quite simple: Excel Spreadsheet containing meta data, a driver program and a package template. <p>The template is your standard truncate and reload pattern with the target table being specified by a parameter. The client validates data by running processes in parallel so the existing mainframe process delivers data to the Billing table while ours delivers to a Billing_NEW table. Once they accept the new process, the target table becomes Billing and the NEW table is dropped. I decided the most native SSIS route would be use specify the target table in as a parameter. We originally have a boolean parameter indicating whether we were loading the new table or the production one but that was more logic and overhead that just specifying which table to load. I force their queries to be dirty reads as some of these queries can be rather messy. <pre class="csharpcode"><br /><span class="kwrd">&lt;</span>#@ template designerbimlpath="/Biml/Packages" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#@ property name="schemaName" type="string" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#@ property name="tableName" type="string" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#@ property name="parameterName" type="string" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#@ property name="sourceQuery" type="string" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#@ property name="sourceConnectionName" type="string" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#@ property name="targetConnectionName" type="string" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#@ property name="businessFriendlyName" type="string" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#<br />string packageName = string.Format("{0}_Load_{1}{2}", targetConnectionName.ToUpper(), businessFriendlyName, "");<br />CustomOutput.PackageName = packageName;<br />#<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Package</span> <span class="attr">Name</span>="&<span class="attr">lt</span>;#= <span class="attr">packageName</span> #<span class="kwrd">&gt;</span>" ConstraintMode="Linear"<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Parameters</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Parameter</span> <span class="attr">Name</span><span class="kwrd">="TargetTableName"</span> <span class="attr">DataType</span><span class="kwrd">="String"</span><span class="kwrd">&gt;&lt;</span>#= tableName #<span class="kwrd">&gt;&lt;/</span><span class="html">Parameter</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Parameters</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Variables</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Variable</span> <span class="attr">Name</span><span class="kwrd">="SchemaName"</span> <span class="attr">DataType</span><span class="kwrd">="String"</span><span class="kwrd">&gt;&lt;</span>#= schemaName#<span class="kwrd">&gt;&lt;/</span><span class="html">Variable</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Variable</span> <span class="attr">Name</span><span class="kwrd">="TableName"</span> <span class="attr">DataType</span><span class="kwrd">="String"</span> <span class="attr">EvaluateAsExpression</span><span class="kwrd">="true"</span><span class="kwrd">&gt;&lt;</span>#= parameterName #<span class="kwrd">&gt;&lt;/</span><span class="html">Variable</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Variable</span> <span class="attr">Name</span><span class="kwrd">="QualifiedTableName"</span> <span class="attr">DataType</span><span class="kwrd">="String"</span> <span class="attr">EvaluateAsExpression</span><span class="kwrd">="true"</span><span class="kwrd">&gt;</span><span class="attr">&amp;quot;</span>[<span class="attr">&amp;quot;</span> + @[User::SchemaName] + <span class="attr">&amp;quot;</span>].[<span class="attr">&amp;quot;</span> + @[User::TableName]+ <span class="attr">&amp;quot;</span>]<span class="attr">&amp;quot;</span><span class="kwrd">&lt;/</span><span class="html">Variable</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Variable</span> <span class="attr">Name</span><span class="kwrd">="QueryTruncate"</span> <span class="attr">DataType</span><span class="kwrd">="String"</span> <span class="attr">EvaluateAsExpression</span><span class="kwrd">="true"</span><span class="kwrd">&gt;</span>"TRUNCATE TABLE " + @[User::QualifiedTableName] + ";"<span class="kwrd">&lt;/</span><span class="html">Variable</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Variables</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Tasks</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">ExecuteSQL</span> <span class="attr">Name</span><span class="kwrd">="SQL Truncate Target"</span> <span class="attr">ConnectionName</span>="&<span class="attr">lt</span>;#= <span class="attr">targetConnectionName</span> #<span class="kwrd">&gt;</span>"<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">VariableInput</span> <span class="attr">VariableName</span><span class="kwrd">="User.QueryTruncate"</span> <span class="kwrd">/&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">ExecuteSQL</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Dataflow</span> <span class="attr">Name</span>="<span class="attr">DFT</span> <span class="attr">Load</span> &<span class="attr">lt</span>;#= <span class="attr">businessFriendlyName</span> #<span class="kwrd">&gt;</span>"<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Transformations</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">OleDbSource</span> <span class="attr">ConnectionName</span>="&<span class="attr">lt</span>;#= <span class="attr">sourceConnectionName</span> #<span class="kwrd">&gt;</span>" Name="OLESRC Query "<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">DirectInput</span><span class="kwrd">&gt;&lt;!</span>[CDATA[SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;<br /><span class="kwrd">&lt;</span>#= sourceQuery#<span class="kwrd">&gt;</span><br />]]<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">DirectInput</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">OleDbSource</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">OleDbDestination</span> <span class="attr">Name</span>="<span class="attr">OLEDST</span> &<span class="attr">lt</span>;#= <span class="attr">schemaName</span> #<span class="kwrd">&gt;</span>_<span class="kwrd">&lt;</span>#= tableName#<span class="kwrd">&gt;</span>" ConnectionName="<span class="kwrd">&lt;</span>#= targetConnectionName #<span class="kwrd">&gt;</span>"<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">TableFromVariableOutput</span> <span class="attr">VariableName</span><span class="kwrd">="User.QualifiedTableName"</span> <span class="kwrd">/&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">OleDbDestination</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Transformations</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Dataflow</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Tasks</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Package</span><span class="kwrd">&gt;</span></pre> <p>My ProjectDriver.biml file is fairly straight forward. In line 1 I provide a relative path to my EPPlus.dll The <cite>..\</cite> indicates I would find the assembly one folder up - two folders actually since I have a single copy in my base Visual Studio folder. Line 2 specifies we need to bring in OfficeOpenXml library. In Line 5 I create a variable that will hold the metadata for my solution. Line 6 is kind of interesting. I let the template determine what the package name should be based on the supplied meta data. Rather than having to perform that logic twice, it'd be nice to keep track of what packages have been created. Not only nice, it'll be required since we're using the Project Deployment Model! Line 19 is where we actually stamp out a specific package and look at that second parameter <cite>out customOutput</cite> That is the mechanism for our template to send information back to the caller. In our case, we'll add the package name to our ever growing list of packages. In line 28, we then run back through our list of packages and build out the project's definition. And that's about it. We've already talked about the GetExcelDriverData method. The GetDriverData method provides a simple abstraction between where I actually get metadata and how the packages are built. You can see a commented out reference to a GetStaticDriverData method which I used during development to test boundary conditions. Who knows, maybe I will pull from Azure Tables next... <pre class="csharpcode"><br /><span class="kwrd">&lt;</span>#@ assembly name= "..\..\EPPlus.dll" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#@ import namespace="OfficeOpenXml" #<span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span><span class="html">Biml</span> <span class="attr">xmlns</span><span class="kwrd">="http://schemas.varigence.com/biml.xsd"</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#<br /> Dictionary<span class="kwrd">&lt;</span><span class="html">string</span>, <span class="attr">List</span>&<span class="attr">lt</span>;<span class="attr">string</span><span class="kwrd">&gt;&gt;</span> dasData = new Dictionary<span class="kwrd">&lt;</span><span class="html">string</span>, <span class="attr">List</span>&<span class="attr">lt</span>;<span class="attr">string</span><span class="kwrd">&gt;&gt;</span>();<br /> List<span class="kwrd">&lt;</span><span class="html">string</span><span class="kwrd">&gt;</span> packageList = new List<span class="kwrd">&lt;</span><span class="html">string</span><span class="kwrd">&gt;</span>();<br /> string templateName = "inc_TruncAndReloadPackageParameter.biml";<br /> string projectName = "SHADOW_IT_DataSnapshot";<br /><br /> // Get our meta data <br /> dasData = GetDriverData();<br />#<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#<br /><br /> dynamic customOutput;<br /> foreach(var key in dasData.Keys)<br /> {<br /> WriteLine(CallBimlScriptWithOutput(templateName, out customOutput, dasData[key][0], dasData[key][1], dasData[key][2], dasData[key][3], dasData[key][4], dasData[key][5], dasData[key][6]));<br /> packageList.Add(customOutput.PackageName);<br /> }<br />#<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Projects</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">PackageProject</span> <span class="attr">Name</span>="&<span class="attr">lt</span>;#= <span class="attr">projectName</span> #<span class="kwrd">&gt;</span>"<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;</span>#<br /> foreach(var key in packageList)<br /> {<br />#<span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Package</span> <span class="attr">PackageName</span>="&<span class="attr">lt</span>;#= <span class="attr">key</span> #<span class="kwrd">&gt;</span>" <span class="kwrd">/&gt;</span><br /><span class="kwrd">&lt;</span>#<br /> }<br />#<span class="kwrd">&gt;</span> <br /> <span class="kwrd">&lt;/</span><span class="html">Packages</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Connections</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Connection</span> <span class="attr">ConnectionName</span><span class="kwrd">="WWI_DB"</span> <span class="kwrd">/&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Connection</span> <span class="attr">ConnectionName</span><span class="kwrd">="WWI_DW"</span> <span class="kwrd">/&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Connections</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">PackageProject</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Projects</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">Connections</span><span class="kwrd">&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">OleDbConnection</span> <span class="attr">Name</span><span class="kwrd">="WWI_DB"</span> <span class="attr">ConnectionString</span><span class="kwrd">="Data Source=.\DEV2017;Initial Catalog=WWI_DW;Provider=SQLNCLI11.1;Integrated Security=SSPI;Application Intent=READONLY;ConnectionTimeout=0;"</span> <span class="attr">CreateInProject</span><span class="kwrd">="true"</span> <span class="kwrd">/&gt;</span><br /> <span class="kwrd">&lt;</span><span class="html">OleDbConnection</span> <span class="attr">Name</span><span class="kwrd">="WWI_DW"</span> <span class="attr">ConnectionString</span><span class="kwrd">="Data Source=.\DEV2017;Initial Catalog=WWI_DB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Application Intent=READONLY;"</span> <span class="attr">CreateInProject</span><span class="kwrd">="true"</span> <span class="kwrd">/&gt;</span><br /> <span class="kwrd">&lt;/</span><span class="html">Connections</span><span class="kwrd">&gt;</span><br /><span class="kwrd">&lt;/</span><span class="html">Biml</span><span class="kwrd">&gt;</span><br /><br /><#+<br /><br /> /// <span class="kwrd">Get</span> <span class="kwrd">data</span> <span class="kwrd">from</span> Excel worksheet<br /> <span class="kwrd">public</span> <span class="kwrd">Dictionary</span>&lt;string, List&lt;string&gt;&gt; GetExcelDriverData(string sourceFile)<br /> {<br /> <span class="kwrd">Dictionary</span>&lt;string, List&lt;string&gt;&gt; d = <span class="kwrd">new</span> <span class="kwrd">Dictionary</span>&lt;string, List&lt;string&gt;&gt;();<br /> System.IO.FileInfo fi = <span class="kwrd">new</span> System.IO.FileInfo(sourceFile);<br /> <span class="kwrd">using</span> (ExcelPackage ep = <span class="kwrd">new</span> ExcelPackage(fi))<br /> {<br /> ExcelWorkbook wb = ep.Workbook;<br /> ExcelWorksheet ws = wb.Worksheets.<span class="kwrd">First</span>();<br /> <span class="kwrd">if</span> (ws != <span class="kwrd">null</span>)<br /> {<br /> // 1 based <span class="kwrd">array</span> <span class="kwrd">to</span> 7, inclusive<br /> <span class="kwrd">for</span> (<span class="kwrd">int</span> i = ws.Dimension.<span class="kwrd">Start</span>.<span class="kwrd">Row</span>+1; i &lt; ws.Dimension.<span class="kwrd">End</span>.<span class="kwrd">Row</span>+1; i++)<br /> {<br /> List&lt;string&gt; <span class="kwrd">row</span> = <span class="kwrd">new</span> List&lt;string&gt;() { ws.Cells[i, 1].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 2].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 3].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 4].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 5].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 6].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 7].<span class="kwrd">Value</span>.ToString()<br /> };<br /> <br /> d[ws.Cells[i, 7].<span class="kwrd">Value</span>.ToString()] = <span class="kwrd">row</span>;<br /> }<br /> }<br /> }<br /> <br /> <span class="kwrd">return</span> d;<br /> }<br /><br /> <span class="kwrd">public</span> <span class="kwrd">Dictionary</span>&lt;string, List&lt;string&gt;&gt; GetDriverData()<br /> {<br /> string sourceFile= @"C:\Users\billinkc\Documents\ShadowIt_DataSnap.xlsx";<br /> <span class="kwrd">return</span> GetExcelDriverData(sourceFile);<br /> //<span class="kwrd">return</span> GetStaticDriverData();<br /> }<br />#><br /></pre> <p>And that's how we can use EPPlus to consume metadata stored in Excel to generate many packages with Biml. Let me know if this helps or if you have questions about how to get this running. It's good stuff, I can't get enough of it. <p>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-26576234872110856362018-10-08T08:00:00.000-05:002018-10-08T08:00:03.286-05:00Reading Excel files without Excel<h1>Reading Excel files without Excel</h1><p>A common problem working with Excel data is Excel itself. Working with it programatically requires an installation of Office, and the resulting license cost, and once everything is set, you're still working with COM objects which present its own set of challenges. If only there was a better way. <p>Enter, the better way - <a href="https://github.com/JanKallman/EPPlus" target="_blank">EPPlus</a>. This is an open source library that wraps the OpenXml library which allows you to simply reference a DLL. No more installation hassles, no more licensing (LGPL) expense, just a simple reference you can package with your solutions. <p>Let's look at an example. Here's a simple spreadsheet with a header row and a row's worth of data. <p><img border="0" src="https://4.bp.blogspot.com/-VUfkZgpX4HA/W7pjRLxt97I/AAAAAAAAdoM/4SHpLcr15kgU6-EKj380-gKWRdzsX1R-QCLcBGAs/s1600/BimlMetadataExcel.png" data-original-width="1093" data-original-height="46" /> <p>For each row, after the header, I'll read the 7 columns into a list and then, since I assume the last column, BusinessFriendlyName, is unique, I'll use that as the key for my return dictionary. <pre class="csharpcode"><br /><span class="kwrd">using</span> OfficeOpenXml;<br /></pre>... <pre class="csharpcode"><br /> /// <span class="kwrd">Get</span> <span class="kwrd">data</span> <span class="kwrd">from</span> Excel worksheet<br /> <span class="kwrd">public</span> <span class="kwrd">Dictionary</span>&lt;string, List&lt;string&gt;&gt; GetExcelDriverData(string sourceFile)<br /> {<br /> <span class="kwrd">Dictionary</span>&lt;string, List&lt;string&gt;&gt; d = <span class="kwrd">new</span> <span class="kwrd">Dictionary</span>&lt;string, List&lt;string&gt;&gt;();<br /> System.IO.FileInfo fi = <span class="kwrd">new</span> System.IO.FileInfo(sourceFile);<br /> <span class="kwrd">using</span> (ExcelPackage ep = <span class="kwrd">new</span> ExcelPackage(fi))<br /> {<br /> ExcelWorkbook wb = ep.Workbook;<br /> ExcelWorksheet ws = wb.Worksheets.<span class="kwrd">First</span>();<br /> <span class="kwrd">if</span> (ws != <span class="kwrd">null</span>)<br /> {<br /> // 1 based <span class="kwrd">array</span> <span class="kwrd">to</span> 7, inclusive<br /> <span class="kwrd">for</span> (<span class="kwrd">int</span> i = ws.Dimension.<span class="kwrd">Start</span>.<span class="kwrd">Row</span>+1; i &lt; ws.Dimension.<span class="kwrd">End</span>.<span class="kwrd">Row</span>+1; i++)<br /> {<br /> List&lt;string&gt; <span class="kwrd">row</span> = <span class="kwrd">new</span> List&lt;string&gt;() { ws.Cells[i, 1].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 2].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 3].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 4].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 5].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 6].<span class="kwrd">Value</span>.ToString()<br /> , ws.Cells[i, 7].<span class="kwrd">Value</span>.ToString()<br /> };<br /> <br /> d[ws.Cells[i, 7].<span class="kwrd">Value</span>.ToString()] = <span class="kwrd">row</span>;<br /> }<br /> }<br /> }<br /> <br /> <span class="kwrd">return</span> d;<br /> }<br /></pre> <p>It's as easy as that. There are plenty of more clever implementations out there but I wanted to demonstrate a quick and easy method to read Excel from your .NET code. <br/>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-52734520182151548672018-08-14T08:00:00.000-05:002018-08-14T10:23:12.671-05:00A date dimension for SQL Server<h1>A date dimension for SQL Server</h1><p>The most common table you will find in a data warehouse will be the date dimension. There is no "right" implementation beyond what the customer needs to solve their business problem. I'm posting a date dimension for SQL Server that I generally find useful as a starting point in the hopes that I quit losing it. Perhaps you'll find it useful or can use the approach to build one more tailored to your environment. <p>As the comments indicate, this will create: a DW schema, a table named DimDate and then populate the date dimension from 1900-01-01 to 2079-06-06 endpoints inclusive. I also patch in 9999-12-31 as a well known "unknown" date value. Sure, it's odd to have an incomplete year - this is your opportunity to tune the supplied code ;) <pre class="csharpcode"><br /><span class="rem">-- At the conclusion of this script, there will be</span><br /><span class="rem">-- A schema named DW</span><br /><span class="rem">-- A table named DW.DimDate</span><br /><span class="rem">-- DW.DimDate will be populated with all the days between 1900-01-01 and 2079-06-06 (inclusive)</span><br /><span class="rem">-- and the sentinel date of 9999-12-31</span><br /><br /><span class="kwrd">IF</span> <span class="kwrd">NOT</span> <span class="kwrd">EXISTS</span><br />(<br /> <span class="kwrd">SELECT</span> * <span class="kwrd">FROM</span> sys.schemas <span class="kwrd">AS</span> S <span class="kwrd">WHERE</span> S.name = <span class="str">'DW'</span><br />)<br /><span class="kwrd">BEGIN</span><br /> <span class="kwrd">EXECUTE</span>(<span class="str">'CREATE SCHEMA DW AUTHORIZATION dbo;'</span>);<br /><span class="kwrd">END</span><br /><span class="kwrd">GO</span><br /><span class="kwrd">IF</span> <span class="kwrd">NOT</span> <span class="kwrd">EXISTS</span><br />(<br /> <span class="kwrd">SELECT</span> * <span class="kwrd">FROM</span> sys.schemas <span class="kwrd">AS</span> S <span class="kwrd">INNER</span> <span class="kwrd">JOIN</span> sys.tables <span class="kwrd">AS</span> T <span class="kwrd">ON</span> T.schema_id = S.schema_id<br /> <span class="kwrd">WHERE</span> S.name = <span class="str">'DW'</span> <span class="kwrd">AND</span> T.name = <span class="str">'DimDate'</span><br />)<br /><span class="kwrd">BEGIN</span><br /> <span class="kwrd">CREATE</span> <span class="kwrd">TABLE</span> DW.DimDate<br /> (<br /> DateSK <span class="kwrd">int</span> <span class="kwrd">NOT</span> <span class="kwrd">NULL</span><br /> , FullDate <span class="kwrd">date</span> <span class="kwrd">NOT</span> <span class="kwrd">NULL</span><br /> , CalendarYear <span class="kwrd">int</span> <span class="kwrd">NOT</span> <span class="kwrd">NULL</span><br /> , CalendarYearText <span class="kwrd">char</span>(4) <span class="kwrd">NOT</span> <span class="kwrd">NULL</span><br /> , CalendarMonth <span class="kwrd">int</span> <span class="kwrd">NOT</span> <span class="kwrd">NULL</span><br /> , CalendarMonthText <span class="kwrd">varchar</span>(12) <span class="kwrd">NOT</span> <span class="kwrd">NULL</span><br /> , CalendarDay <span class="kwrd">int</span> <span class="kwrd">NOT</span> <span class="kwrd">NULL</span><br /> , CalendarDayText <span class="kwrd">char</span>(2) <span class="kwrd">NOT</span> <span class="kwrd">NULL</span><br /> , <span class="kwrd">CONSTRAINT</span> PK_DW_DimDate<br /> <span class="kwrd">PRIMARY</span> <span class="kwrd">KEY</span> <span class="kwrd">CLUSTERED</span><br /> (<br /> DateSK <span class="kwrd">ASC</span><br /> )<br /> <span class="kwrd">WITH</span> (ALLOW_ROW_LOCKS = <span class="kwrd">ON</span>, ALLOW_PAGE_LOCKS = <span class="kwrd">ON</span>, DATA_COMPRESSION = PAGE)<br /> , <span class="kwrd">CONSTRAINT</span> UQ_DW_DimDate <span class="kwrd">UNIQUE</span> (FullDate)<br /> );<br /><span class="kwrd">END</span><br /><span class="kwrd">GO</span><br /><span class="kwrd">WITH</span> <br /> <span class="rem">-- Define the start and the terminal value</span><br /> BOOKENDS(FirstDate, LastDate) <span class="kwrd">AS</span> (<span class="kwrd">SELECT</span> DATEFROMPARTS(1900,1,1), DATEFROMPARTS(9999,12,31))<br /> <span class="rem">-- itzik ben gan rapid number generator</span><br /> <span class="rem">-- Builds 65537 rows. Need more - follow the pattern</span><br /> <span class="rem">-- Need fewer rows, add a top below</span><br />, T0 <span class="kwrd">AS</span> <br />(<br /> <span class="rem">-- 2</span><br /> <span class="kwrd">SELECT</span> 1 <span class="kwrd">AS</span> n<br /> <span class="kwrd">UNION</span> <span class="kwrd">ALL</span> <span class="kwrd">SELECT</span> 1<br />)<br />, T1 <span class="kwrd">AS</span><br />(<br /> <span class="rem">-- 2^2 =&gt; 4 </span><br /> <span class="kwrd">SELECT</span> 1 <span class="kwrd">AS</span> n<br /> <span class="kwrd">FROM</span><br /> T0<br /> <span class="kwrd">CROSS</span> APPLY T0 <span class="kwrd">AS</span> TX<br />)<br />, T2 <span class="kwrd">AS</span> <br />(<br /> <span class="rem">-- 4^4 =&gt; 16</span><br /> <span class="kwrd">SELECT</span> 1 <span class="kwrd">AS</span> n<br /> <span class="kwrd">FROM</span><br /> T1<br /> <span class="kwrd">CROSS</span> APPLY T1 <span class="kwrd">AS</span> TX<br />)<br />, T3 <span class="kwrd">AS</span> <br />(<br /> <span class="rem">-- 16^16 =&gt; 256</span><br /> <span class="kwrd">SELECT</span> 1 <span class="kwrd">AS</span> n<br /> <span class="kwrd">FROM</span><br /> T2<br /> <span class="kwrd">CROSS</span> APPLY T2 <span class="kwrd">AS</span> TX<br />)<br />, T4 <span class="kwrd">AS</span><br />(<br /> <span class="rem">-- 256^256 =&gt; 65536</span><br /> <span class="rem">-- or approx 179 years</span><br /> <span class="kwrd">SELECT</span> 1 <span class="kwrd">AS</span> n<br /> <span class="kwrd">FROM</span><br /> T3<br /> <span class="kwrd">CROSS</span> APPLY T3 <span class="kwrd">AS</span> TX<br />)<br />, T5 <span class="kwrd">AS</span><br />(<br /> <span class="rem">-- 65536^65536 =&gt; basically infinity</span><br /> <span class="kwrd">SELECT</span> 1 <span class="kwrd">AS</span> n<br /> <span class="kwrd">FROM</span><br /> T4<br /> <span class="kwrd">CROSS</span> APPLY T4 <span class="kwrd">AS</span> TX<br />)<br /> <span class="rem">-- Assume we now have enough numbers for our purpose</span><br />, NUMBERS <span class="kwrd">AS</span><br />(<br /> <span class="rem">-- Add a SELECT TOP (N) here if you need fewer rows</span><br /> <span class="kwrd">SELECT</span><br /> <span class="kwrd">CAST</span>(ROW_NUMBER() <span class="kwrd">OVER</span> (<span class="kwrd">ORDER</span> <span class="kwrd">BY</span> (<span class="kwrd">SELECT</span> <span class="kwrd">NULL</span>)) <span class="kwrd">AS</span> <span class="kwrd">int</span>) -1 <span class="kwrd">AS</span> number<br /> <span class="kwrd">FROM</span><br /> T4<br /> <span class="kwrd">UNION</span> <br /> <span class="rem">-- Build End of time date</span><br /> <span class="rem">-- Get an N value of 2958463 for</span><br /> <span class="rem">-- 9999-12-31 assuming start date of 1900-01-01</span><br /> <span class="kwrd">SELECT</span><br /> ABS(DATEDIFF(<span class="kwrd">DAY</span>, BE.LastDate, BE.FirstDate))<br /> <span class="kwrd">FROM</span><br /> BOOKENDS <span class="kwrd">AS</span> BE<br />)<br />, DATES <span class="kwrd">AS</span><br />(<br /><span class="kwrd">SELECT</span><br /> PARTS.DateSk<br />, FD.FullDate<br />, PARTS.CalendarYear<br />, PARTS.CalendarYearText<br />, PARTS.CalendarMonth<br />, PARTS.CalendarMonthText<br />, PARTS.CalendarDay<br />, PARTS.CalendarDayText<br /><span class="kwrd">FROM</span><br /> NUMBERS <span class="kwrd">AS</span> N<br /> <span class="kwrd">CROSS</span> APPLY<br /> (<br /> <span class="kwrd">SELECT</span><br /> DATEADD(<span class="kwrd">DAY</span>, N.number, BE.FirstDate) <span class="kwrd">AS</span> FullDate<br /> <span class="kwrd">FROM</span><br /> BOOKENDS <span class="kwrd">AS</span> BE<br /> )FD<br /> <span class="kwrd">CROSS</span> APPLY<br /> (<br /> <span class="kwrd">SELECT</span><br /> <span class="kwrd">CAST</span>(<span class="kwrd">CONVERT</span>(<span class="kwrd">char</span>(8), FD.FullDate, 112) <span class="kwrd">AS</span> <span class="kwrd">int</span>) <span class="kwrd">AS</span> DateSk<br /> , DATEPART(<span class="kwrd">YEAR</span>, FD.FullDate) <span class="kwrd">AS</span> [CalendarYear] <br /> , DATENAME(<span class="kwrd">YEAR</span>, FD.FullDate) <span class="kwrd">AS</span> [CalendarYearText]<br /> , DATEPART(<span class="kwrd">MONTH</span>, FD.FullDate) <span class="kwrd">AS</span> [CalendarMonth]<br /> , DATENAME(<span class="kwrd">MONTH</span>, FD.FullDate) <span class="kwrd">AS</span> [CalendarMonthText]<br /> , DATEPART(<span class="kwrd">DAY</span>, FD.FullDate) <span class="kwrd">AS</span> [CalendarDay]<br /> , DATENAME(<span class="kwrd">DAY</span>, FD.FullDate) <span class="kwrd">AS</span> [CalendarDayText]<br /><br /> )PARTS<br />)<br /><span class="kwrd">INSERT</span> <span class="kwrd">INTO</span><br /> DW.DimDate<br />(<br /> DateSK<br />, FullDate<br />, CalendarYear<br />, CalendarYearText<br />, CalendarMonth<br />, CalendarMonthText<br />, CalendarDay<br />, CalendarDayText<br />)<br /><span class="kwrd">SELECT</span><br /> D.DateSk<br />, D.FullDate<br />, D.CalendarYear<br />, D.CalendarYearText<br />, D.CalendarMonth<br />, D.CalendarMonthText<br />, D.CalendarDay<br />, D.CalendarDayText<br /><span class="kwrd">FROM</span><br /> DATES <span class="kwrd">AS</span> D<br /><span class="kwrd">WHERE</span> <span class="kwrd">NOT</span> <span class="kwrd">EXISTS</span><br />(<br /> <span class="kwrd">SELECT</span> * <span class="kwrd">FROM</span> DW.DimDate <span class="kwrd">AS</span> DD<br /> <span class="kwrd">WHERE</span> DD.DateSK = D.DateSk<br />);<br /></pre>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-53530635901124495752018-04-05T08:00:00.000-05:002018-04-09T08:23:50.744-05:00Sort SQL Server tables into similarly sized buckets<h1>Sort SQL Server Tables into similarly sized buckets</h1><p>You need to do <em>something</em> to all of the tables in SQL Server. That something can be anything: reindex/reorg, export the data, perform some other maintenance---it really doesn't matter. What does matter is that you'd like to get it done sooner rather than later. If time is no consideration, then you'd likely just do one table at a time until you've done them all. Sometimes, a maximum degree of parallelization of one is less than ideal. You're paying for more than one processor core, you might as well use it. The devil in splitting a workload out can be ensuring the tasks are well balanced. When I'm staging data in SSIS, I often use a row count as an approximation for a time cost. It's not perfect - a million row table 430 columns wide might actually take longer than the 250 million row key-value table. <p>A sincere tip of the hat to Daniel Hutmacher (<a href="https://sqlsunday.com" target="_blank">b</a>|<a href="https://twitter.com/dhmacher/" target="_blank">t</a>)for his answer on this <a href="https://dba.stackexchange.com/q/127920/2131" target="_blank">StackExchange post</a>. He has some great logic for sorting tables into approximately equally sized bins and it performs reasonably well. <!-- code formatted by http://manoli.net/csharpformat/ --><pre class="csharpcode"><br /><span class="kwrd">SET</span> NOCOUNT <span class="kwrd">ON</span>;<br /><span class="kwrd">DECLARE</span><br /> @bucketCount tinyint = 6;<br /><br /><span class="kwrd">IF</span> OBJECT_ID(<span class="str">'tempdb..#work'</span>) <span class="kwrd">IS</span> <span class="kwrd">NOT</span> <span class="kwrd">NULL</span><br /><span class="kwrd">BEGIN</span><br /> <span class="kwrd">DROP</span> <span class="kwrd">TABLE</span> #<span class="kwrd">work</span>;<br /><span class="kwrd">END</span><br /><br /><span class="kwrd">CREATE</span> <span class="kwrd">TABLE</span> #<span class="kwrd">work</span> (<br /> _row <span class="kwrd">int</span> <span class="kwrd">IDENTITY</span>(1, 1) <span class="kwrd">NOT</span> <span class="kwrd">NULL</span>,<br /> [SchemaName] sysname,<br /> [TableName] sysname,<br /> [RowsCounted] <span class="kwrd">bigint</span> <span class="kwrd">NOT</span> <span class="kwrd">NULL</span>,<br /> GroupNumber <span class="kwrd">int</span> <span class="kwrd">NOT</span> <span class="kwrd">NULL</span>,<br /> moved tinyint <span class="kwrd">NOT</span> <span class="kwrd">NULL</span>,<br /> <span class="kwrd">PRIMARY</span> <span class="kwrd">KEY</span> <span class="kwrd">CLUSTERED</span> ([RowsCounted], _row)<br />);<br /><br /><span class="kwrd">WITH</span> cte <span class="kwrd">AS</span> (<br /><span class="kwrd">SELECT</span> B.RowsCounted<br />, B.SchemaName<br />, B.TableName<br /> <span class="kwrd">FROM</span><br /> (<br /> <span class="kwrd">SELECT</span><br /> s.[Name] <span class="kwrd">as</span> [SchemaName]<br /> , t.[name] <span class="kwrd">as</span> [TableName]<br /> , <span class="kwrd">SUM</span>(p.<span class="kwrd">rows</span>) <span class="kwrd">as</span> [RowsCounted]<br /> <span class="kwrd">FROM</span><br /> sys.schemas s<br /> <span class="kwrd">LEFT</span> <span class="kwrd">OUTER</span> <span class="kwrd">JOIN</span> <br /> sys.tables t<br /> <span class="kwrd">ON</span> s.schema_id = t.schema_id<br /> <span class="kwrd">LEFT</span> <span class="kwrd">OUTER</span> <span class="kwrd">JOIN</span> <br /> sys.partitions p<br /> <span class="kwrd">ON</span> t.object_id = p.object_id<br /> <span class="kwrd">LEFT</span> <span class="kwrd">OUTER</span> <span class="kwrd">JOIN</span> <br /> sys.allocation_units a<br /> <span class="kwrd">ON</span> p.partition_id = a.container_id<br /> <span class="kwrd">WHERE</span><br /> p.index_id <span class="kwrd">IN</span> (0,1)<br /> <span class="kwrd">AND</span> p.<span class="kwrd">rows</span> <span class="kwrd">IS</span> <span class="kwrd">NOT</span> <span class="kwrd">NULL</span><br /> <span class="kwrd">AND</span> a.type = 1<br /> <span class="kwrd">GROUP</span> <span class="kwrd">BY</span> <br /> s.[Name]<br /> , t.[name]<br /> ) B<br />)<br /><br /><span class="kwrd">INSERT</span> <span class="kwrd">INTO</span> #<span class="kwrd">work</span> ([RowsCounted], SchemaName, TableName, GroupNumber, moved)<br /><span class="kwrd">SELECT</span> [RowsCounted], SchemaName, TableName, ROW_NUMBER() <span class="kwrd">OVER</span> (<span class="kwrd">ORDER</span> <span class="kwrd">BY</span> [RowsCounted]) % @bucketCount <span class="kwrd">AS</span> GroupNumber, 0<br /><span class="kwrd">FROM</span> cte;<br /><br /><br /><span class="kwrd">WHILE</span> (@@<span class="kwrd">ROWCOUNT</span>!=0)<br /><span class="kwrd">WITH</span> cte <span class="kwrd">AS</span><br />(<br /> <span class="kwrd">SELECT</span><br /> *<br /> , <span class="kwrd">SUM</span>(RowsCounted) <span class="kwrd">OVER</span> (PARTITION <span class="kwrd">BY</span> GroupNumber) - <span class="kwrd">SUM</span>(RowsCounted) <span class="kwrd">OVER</span> (PARTITION <span class="kwrd">BY</span> (<span class="kwrd">SELECT</span> <span class="kwrd">NULL</span>)) / @bucketCount <span class="kwrd">AS</span> _GroupNumberoffset<br /> <span class="kwrd">FROM</span><br /> #<span class="kwrd">work</span><br />)<br /><span class="kwrd">UPDATE</span><br /> w<br /><span class="kwrd">SET</span><br /> w.GroupNumber = (<span class="kwrd">CASE</span> w._row<br /> <span class="kwrd">WHEN</span> x._pos_row <span class="kwrd">THEN</span> x._neg_GroupNumber<br /> <span class="kwrd">ELSE</span> x._pos_GroupNumber<br /> <span class="kwrd">END</span><br /> )<br />, w.moved = w.moved + 1<br /><span class="kwrd">FROM</span><br /> #<span class="kwrd">work</span> <span class="kwrd">AS</span> w<br /> <span class="kwrd">INNER</span> <span class="kwrd">JOIN</span><br /> (<br /> <span class="kwrd">SELECT</span> <span class="kwrd">TOP</span> 1<br /> pos._row <span class="kwrd">AS</span> _pos_row<br /> , pos.GroupNumber <span class="kwrd">AS</span> _pos_GroupNumber<br /> , neg._row <span class="kwrd">AS</span> _neg_row<br /> , neg.GroupNumber <span class="kwrd">AS</span> _neg_GroupNumber<br /> <span class="kwrd">FROM</span><br /> cte <span class="kwrd">AS</span> pos<br /> <span class="kwrd">INNER</span> <span class="kwrd">JOIN</span><br /> cte <span class="kwrd">AS</span> neg<br /> <span class="kwrd">ON</span> pos._GroupNumberoffset &gt; 0<br /> <span class="kwrd">AND</span> neg._GroupNumberoffset &lt; 0<br /> <span class="kwrd">AND</span><br /> <span class="rem">--- To prevent infinite recursion:</span><br /> pos.moved &lt; @bucketCount<br /> <span class="kwrd">AND</span> neg.moved &lt; @bucketCount<br /> <span class="kwrd">WHERE</span> <span class="rem">--- must improve positive side's offset:</span><br /> ABS(pos._GroupNumberoffset - pos.RowsCounted + neg.RowsCounted) &lt;= pos._GroupNumberoffset<br /> <span class="kwrd">AND</span><br /> <span class="rem">--- must improve negative side's offset:</span><br /> ABS(neg._GroupNumberoffset - neg.RowsCounted + pos.RowsCounted) &lt;= ABS(neg._GroupNumberoffset)<br /> <span class="rem">--- Largest changes first:</span><br /> <span class="kwrd">ORDER</span> <span class="kwrd">BY</span><br /> ABS(pos.RowsCounted - neg.RowsCounted) <span class="kwrd">DESC</span><br /> ) <span class="kwrd">AS</span> x<br /> <span class="kwrd">ON</span> w._row <span class="kwrd">IN</span><br /> (<br /> x._pos_row<br /> , x._neg_row<br /> );<br /><br /></pre> <p>Now what? Let's look at the results. Run this against AdventureWorks and AdventureWorksDW <pre><br /><span class="kwrd">SELECT</span><br /> W.GroupNumber<br />, COUNT_BIG(1) AS TotalTables<br />, SUM(W.RowsCounted) AS GroupTotalRows<br /><span class="kwrd">FROM</span><br /> #<span class="kwrd">work</span> <span class="kwrd">AS</span> W<br /><span class="kwrd">GROUP BY</span><br /> W.GroupNumber<br /><span class="kwrd">ORDER BY</span><br /> W.GroupNumber;<br /><br /><br /><span class="kwrd">SELECT</span><br /> W.GroupNumber<br />, W.SchemaName<br />, W.TableName<br />, W.RowsCounted<br />, COUNT_BIG(1) OVER (PARTITION BY W.GroupNumber ORDER BY (SELECT NULL)) AS TotalTables<br />, SUM(W.RowsCounted) OVER (PARTITION BY W.GroupNumber ORDER BY (SELECT NULL)) AS GroupTotalRows<br /><span class="kwrd">FROM</span><br /> #<span class="kwrd">work</span> <span class="kwrd">AS</span> W<br /><span class="kwrd">ORDER BY</span><br /> W.GroupNumber;<br /></pre><p>For AdventureWorks (2014), I get a nice distribution across my 6 groups. 12 to 13 tables in each bucket and a total row count between 125777 and 128003. That's less than 2% variance between the high and low - I'll take it. <p>If you rerun for AdventureWorksDW, it's a little more interesting. Our 6 groups are again filled with 5 to 6 tables but this time, group 1 is heavily skewed by the fact that FactProductInventory accounts for 73% of all the rows in the entire database. The other 5 tables in the group are the five smallest tables in the database. <p>I then ran this against our data warehouse-like environment. We had a 1206 tables in there for 3283983766 rows (3.2 <strike>million</strike> billion). The query went from instantaneous to about 15 minutes but now I've got a starting point for bucketing my tables into similarly sized groups. <p>What do you think? How do you plan to use this? Do you have a different approach for figuring this out? I looked at R but without knowing what this activity is called, I couldn't find a function to perform the calculations. <p>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-31010115991343213462018-03-12T08:00:00.000-05:002018-03-12T08:00:18.409-05:002018 MVP Summit retrospective<h1>2018 MVP Summit retrospective</h1> <p>Another year of the MVP Summit is in the bag and as always, I have months worth of learning I'm excited to do. <p><a href="https://3.bp.blogspot.com/-JjE-4AgXoNM/WqQi7a5OVUI/AAAAAAAAZyw/_7Qx1j13Df8Yjp7dF9G5Hw-8c2ECBC_HACLcBGAs/s1600/IMG_20180310_115119%2B%25282%2529.jpg" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-JjE-4AgXoNM/WqQi7a5OVUI/AAAAAAAAZyw/_7Qx1j13Df8Yjp7dF9G5Hw-8c2ECBC_HACLcBGAs/s320/IMG_20180310_115119%2B%25282%2529.jpg" width="320" height="277" data-original-width="1600" data-original-height="1385" /></a> <h2>Thank you</h2><p>I'd like to extend a hearty thank you to Microsoft and the various teams for hosting us. I can't imagine the sheer amount of hours spent in preparation, actual time not-spent-working-on-technology-X, much less the expense of caffeinating, feeding, lodging, and transporting us. <h2>What I'm excited about</h2> <h3>Stream Analytics</h3> <p>We have a high performance (60M messages per day averaging 130ms throughput) messaging system that allows us to expose mainframe data as a SQL Server database for analytics. The devil with Service Broker is that there's no built in monitoring. We have a <a href="https://www.linkedin.com/feed/update/urn:li:activity:6343249888771395584" target="_blank">clever dashboard</a> built on the PowerBI reporting streaming dataset source that provides an at-a-glance health check for data processing. What we need though, is something that can drive action based on changes. The September changes in <a href="https://azure.microsoft.com/en-us/blog/new-in-stream-analytics-output-to-azure-functions-built-in-anomaly-detection-etc/" target="_blank">Stream Analytics</a> look like the perfect fit. It allows us to detect not just hard limits (we've violated our 3 second SLA) but the squishier metrics like a background process just woke up and swamped us with a million rows in the past three minutes or our processing time is trending upwards and someone needs to figure out why. <h3>SQL Graph improvements</h3> <p>While we are not yet using graph features, I can see opportunities for it with our client that I want to build some proof of concept models. <h3>Cosmos DB</h3> <p>Alongside the Stream Analytics improvements, perhaps we need to feed the change data into Cosmos and then leverage the <a href="https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed" target="_blank">Change Feed</a> support to push to analytics processing. And just generally, I need to invest some time in Apache Spark. I also learned that I don't need to discover all the patterns for <a href="http://lambda-architecture.net/" target="_blank">lambda architecture</a> as it's already out there with a handy URL to boot. <h3>Cognitive Services</h3><p>Ok, while picking up information about this was just to scratch a very silly itch, I was impressed how easy it was from the web interface. I have bird feeders and even though most seed will state that squirrels are not interested in it, that's a downright lie. <p> <img border="0" src="https://3.bp.blogspot.com/-AWTyg4yOTxU/WqQ6CI2ST_I/AAAAAAAAZz4/84IcF8guh9sgHqiG4BK77v2mY3QlnCbhACLcBGAs/s320/Squirrel.png" width="320" height="151" data-original-width="866" data-original-height="408" alt="Don't mind me, I'm just a fuzzy bird"/> <p>I want a camera pointed at my bird feeder and if a squirrel shows, I want to know about it. I used about a dozen pictures of my bird feeders with and without my nemesis to train the model and then fed back assorted photos to see how smart it was. Except for an image of a squirrel hiding in shadow, it was able to give me high confidence readings on what was featured in the photo. Here we can see that my dog is neither a bird nor a squirrel. <br/> <img border="0" src="https://3.bp.blogspot.com/-HmCWjd2iLoo/WqQ6itdJxRI/AAAAAAAAZ0A/Rl-aN6agyyAd3tkMT9Yk-NxahVuWAocmwCLcBGAs/s400/NotASquirrel.png" width="400" height="222" data-original-width="865" data-original-height="480" alt="Not a squirrel, just a lazy dog" /> <p>I'm so excited to get these bots built out. One for the Raspberry Pi to detect presence at the feeder and then an Azure based recognizer for friend versus foe. Once that's done, the next phase will be to identify specific bird species. And then tie it to feed type and feeder style (tray/platform versus house versus tube) and time of day and ... yes, lot of fun permutations that are easily available without having to learn all the computer vision and statistics. Feel free to give it a whirl at <a href="https://customvision.ai" target="_blank">https://customvision.ai</a> <h3><a href="https://github.com/Microsoft/sqlopsstudio" target="_blank">SQLOps studio</a></h3><p>This is the new cross platform SQL Server Management Studio replacement - sort of. It's not designed to do everything SSMS does but instead the vision is to solve the most needed problems and with the open source model, the community can patch in their own solutions. I'm excited to put together a better reporting interface for the SSISDB. Something that you can actually copy text out of - how crazy is that? <h3>Azure Data Lake Analytics</h3> <p>It had been a year since I had worked through some of the ADLA/USQL so it was good to get back into the language and environment. I need to get on a project that is actually using the technology though to really cement my knowledge. <h2>What I learned</h2><p>In October of 2016, I launched <a href="sterlingdataconsulting.com">Sterling Data Consulting</a> as my company. I sub under a good friend and it's been an adventure running a business but I don't feel like I'm really running a business since I have no other business. One of my TODOs at the conference was to talk to other small shop owners to see if I could discover their "secret sauce." While I got assorted feedback, the two I want to send a special thank you to are John Sterrett of <a href="https://www.procuresql.com/" alt="_blank">Procure SQL</a> and <a href="http://timradney.com/" target="_blank">Tim Radney</a>. Their advice ranged from straight forward "I don't know what you do", "are you for hire" to thoughts on lead acquisition and my lack of vision for sales. <p>Tim was also my roommate and it was great just getting to know him. We traded Boy Scout leader stories and he had excellent ideas for High Adventure fundraisers since that's something our troop is looking to do next year. For being a year younger than me, he sure had a lot more wisdom on the things I don't do or don't do well. You should check him at at the <a href="http://www.sqlsaturday.com/733/eventhome.aspx" target="_blank">Atlanta SQL Saturday</a> and attend his precon on <a href="https://www.eventbrite.com/e/tim-radney-common-sql-server-mistakes-and-how-to-correct-them-registration-42725424896">Common SQL Server mistakes and how to avoid them</a>. <h2>Photos</h2>Bellevue is less scenic than Seattle but the sunshine and warmth on Tuesday made for some nice <a href="https://photos.app.goo.gl/5NVfdMSN8DJf7TWz2" target="_blank">photos of the treehouses</a>. Yes, the Microsoft Campus has adult sized treehouses in it. How cool is that? <p>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-56424617439239878992018-03-02T08:00:00.000-06:002018-03-02T08:00:19.176-06:00Python pandas repeating character tester<h1>Python pandas repeating character tester</h1><p>At one of our clients, we are data profiling. They have a mainframe, it's been running for so long, they no longer have SMEs for their data. We've been able to leverage Service Broker to provide a real-time, under 3 seconds, remote file store for their data. It's pretty cool but now they are trying to <em>do</em> something with the data so we need to understand what the data looks like. We're using a mix of TSQL and python to understand nullability, value variances, etc. One of the "interesting" things we've discovered is that they loved placeholder values. <strong>Everyone</strong> knows a date of 88888888 is a placeholder for the actual date which they'll get two steps later in the workflow. Except sometimes we use 99999999 because the eights are the placeholder for the time. <p>Initially, we were just searching for one sentinel value, then two values until we saw the bigger pattern of "repeated values probably mean something." For us, this matters because we then need to discard those rows for data type suitability. 88888888 isn't a valid date so our logic might determine that column is best served by a numeric data type. Unless we exclude the eights value in which we get a 100% match rate on the column's ability to be converted to a date. <p>How can we determine if a string is nothing but repeated values in python? There's a very clever test from StackOverflow <p><code>source == source[0] * len(source)</code>I would read that as "is the source variable exactly equal to the the first character of source repeated for the length of source?" <p>And that was good, until we hit a NULL (None in python-speak). We then took advantage of the ternary equivalent in python to make it <p><code>(source == source[0] * len(source)) if source else False</code> <h3>Enter Pandas (series)</h3><p>Truth is a funny thing in an Pandas Series. <em>Really, it is</em>. <cite>The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().</cite>. We were trying to apply the above function as we were doing everything else <pre>df.MyColumn.str.len()<br /># this will fail magnificantly<br />(df.MyColumn == df.MyColumn[0] * len(df.MyColumn)) if df.MyColumn else False</code><br /></pre><p>It took me a while since I hadn't really used the pandas library beyond running what my coworker had done. What I needed to do, was get a row context to apply the calculations for true/false. As it stands, the Series stuff wants to try and aggregate the booleans or something like that. And it makes sense from a SQL perspective, you can't really apply aggregates to bit fields (beyond COUNT). <p>So, what's the solution? As always, you're likely to say the exact thing you're looking for. In this case, <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html#pandas.Series.apply" target="_blank">apply</a> was the keyword. <p><code>df.MyColumn.apply(lambda source: (source == source[0] * len(source)) if source else False)</code> <p>Full code you can play with would be <pre><br />import pandas<br />import pprint<br /><br />def isRepeated(src):<br /> return (src == src[0] * len(src)) if src else False<br /> <br />df = pandas.DataFrame({"MyCol":pandas.Series(['AB', 'BC', 'BB', None])})<br /><br />pprint.pprint(df)<br /><br />print()<br /># What rows have the same character in all of them?<br /><br />pprint.pprint(df.MyCol.apply(lambda source:(source == source[0] * len(source)) if source else False))<br />#If you'd like to avoid the anonymous function...<br />pprint.pprint(df.MyCol.apply(isRepeated))<br /></pre> <p>In short, python is glorious and I'm happy to writing in it again ;) <hr/><p/>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-83739095108503368242018-02-23T08:00:00.000-06:002018-02-23T08:00:20.247-06:00Pop Quiz - REPLACE in SQL Server <p>It's amazing the things I've run into with SQL Server this week that I never noticed. In today's pop quiz, let's look at <a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/replace-transact-sql" target="_blank">REPLACE</a> <pre><br />DECLARE<br /> @Repro table<br />(<br /> SourceColumn varchar(30)<br />);<br /><br />INSERT INTO <br /> @Repro<br />(<br /> SourceColumn<br />)<br />SELECT<br /> D.SourceColumn<br />FROM<br />(<br /> VALUES <br /> ('None')<br /> , ('ABC')<br /> , ('BCD')<br /> , ('DEF')<br />)D(SourceColumn);<br /><br />SELECT<br /> R.SourceColumn<br />, REPLACE(R.SourceColumn, 'None', NULL) AS wat<br />FROM<br /> @Repro AS R;<br /></pre> <p>In the preceding example, I load 4 rows into a table and call the REPLACE function on it. Why? Because some numbskull front end developer entered None instead of a NULL for a non-existent value. No problem, I will simply replace all None with NULL. So, what's the value of the wat column? <p>Well, if you're one of those people who reads instruction manuals before attempting anything, you'd have seen <cite>Returns NULL if any one of the arguments is NULL.</cite> Otherwise, you're like me thinking "maybe I put the arguments in the wrong order". Nope, <code>, REPLACE(R.SourceColumn, 'None', '') AS EmptyString</code> that works. So what the heck? Guess I'll actually read the manual... No, this work, I can just use <a href="https://docs.microsoft.com/en-us/sql/t-sql/language-elements/nullif-transact-sql" target="_blank">NULLIF</a> to make the empty strings into a NULL <code>, NULLIF(REPLACE(R.SourceColumn, 'None', ''), '') AS EmptyStringToNull</code> <p>Much better, replace all my instances of None with an empty string and then convert anything that is empty string to null. Wait, what? You know what would be better? Skipping the replace call altogether. <pre><br />SELECT<br /> R.SourceColumn<br />, NULLIF(R.SourceColumn, 'None') AS MuchBetter<br />FROM<br /> @Repro AS R;<br /></pre> <p>Moral of the story and/or quiz: once you have a working solution, rubber duck out your approach to see if there's an opportunity for improvement (only after having committed the working version to source control). <br/>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-77232075757660864842018-02-22T08:00:00.000-06:002018-02-22T08:00:40.457-06:00Altering table types, part 2<h1>Altering table types - a compatibility guide</h1> <p>In <a href="http://billfellows.blogspot.com/2018/02/pop-quiz-altering-column-types.html">yesterday's post</a>, I altered a table type. Pray I don't alter them further. What else is incompatible with an integer column? It's just a morbid curiosity at this point as I don't recall having ever seen this after working with SQL Server for 18 years. <tiny>Side note, dang I'm old</tiny> <p>How best to answer the question, by interrogating the sys.types table and throwing operations against the wall to see what does/doesn't stick. <pre><br />DECLARE<br /> @Results table<br />(<br /> TypeName sysname, Failed bit, ErrorMessage nvarchar(4000)<br />);<br /><br />DECLARE<br /> @DoOver nvarchar(4000) = N'DROP TABLE IF EXISTS dbo.IntToTime;<br />CREATE TABLE dbo.IntToTime (CREATE_TIME int);'<br />, @alter nvarchar(4000) = N'ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME @type'<br />, @query nvarchar(4000) = NULL<br />, @typeName sysname = 'datetime';<br /><br />DECLARE<br /> CSR CURSOR<br />FORWARD_ONLY<br />FOR<br />SELECT <br /> T.name<br />FROM<br /> sys.types AS T<br />WHERE<br /> T.is_user_defined = 0<br /><br />OPEN CSR;<br />FETCH NEXT FROM CSR INTO @typeName<br />WHILE @@FETCH_STATUS = 0<br />BEGIN<br /> BEGIN TRY <br /> EXECUTE sys.sp_executesql @DoOver, N'';<br /> SELECT @query = REPLACE(@alter, N'@type', @typeName);<br /> EXECUTE sys.sp_executesql @query, N'';<br /> <br /> INSERT INTO<br /> @Results<br /> (<br /> TypeName<br /> , Failed<br /> , ErrorMessage<br /> )<br /> SELECT @typeName, CAST(0 AS bit), ERROR_MESSAGE();<br /> END TRY<br /> BEGIN CATCH<br /> INSERT INTO<br /> @Results<br /> (<br /> TypeName<br /> , Failed<br /> , ErrorMessage<br /> )<br /> SELECT @typeName, CAST(1 AS bit), ERROR_MESSAGE()<br /> END CATCH<br /> FETCH NEXT FROM CSR INTO @typeName<br />END<br />CLOSE CSR;<br />DEALLOCATE CSR;<br /><br />SELECT<br />*<br />FROM<br /> @Results AS R<br />ORDER BY<br /> 2,1;<br /></pre> <table><tr><th>TypeName</th><th>Failed</th><th>ErrorMessage</th></tr><tr><td>bigint</td><td>0</td><td></td></tr><tr><td>binary</td><td>0</td><td></td></tr><tr><td>bit</td><td>0</td><td></td></tr><tr><td>char</td><td>0</td><td></td></tr><tr><td>datetime</td><td>0</td><td></td></tr><tr><td>decimal</td><td>0</td><td></td></tr><tr><td>float</td><td>0</td><td></td></tr><tr><td>int</td><td>0</td><td></td></tr><tr><td>money</td><td>0</td><td></td></tr><tr><td>nchar</td><td>0</td><td></td></tr><tr><td>numeric</td><td>0</td><td></td></tr><tr><td>nvarchar</td><td>0</td><td></td></tr><tr><td>real</td><td>0</td><td></td></tr><tr><td>smalldatetime</td><td>0</td><td></td></tr><tr><td>smallint</td><td>0</td><td></td></tr><tr><td>smallmoney</td><td>0</td><td></td></tr><tr><td>sql_variant</td><td>0</td><td></td></tr><tr><td>sysname</td><td>0</td><td></td></tr><tr><td>tinyint</td><td>0</td><td></td></tr><tr><td>varbinary</td><td>0</td><td></td></tr><tr><td>varchar</td><td>0</td><td></td></tr><tr><td>date</td><td>1</td><td>Operand type clash: int is incompatible with date</td></tr><tr><td>datetime2</td><td>1</td><td>Operand type clash: int is incompatible with datetime2</td></tr><tr><td>datetimeoffset</td><td>1</td><td>Operand type clash: int is incompatible with datetimeoffset</td></tr><tr><td>geography</td><td>1</td><td>Operand type clash: int is incompatible with geography</td></tr><tr><td>geometry</td><td>1</td><td>Operand type clash: int is incompatible with geometry</td></tr><tr><td>hierarchyid</td><td>1</td><td>Operand type clash: int is incompatible with hierarchyid</td></tr><tr><td>image</td><td>1</td><td>Operand type clash: int is incompatible with image</td></tr><tr><td>ntext</td><td>1</td><td>Operand type clash: int is incompatible with ntext</td></tr><tr><td>text</td><td>1</td><td>Operand type clash: int is incompatible with text</td></tr><tr><td>time</td><td>1</td><td>Operand type clash: int is incompatible with time</td></tr><tr><td>timestamp</td><td>1</td><td>Cannot alter column 'CREATE_TIME' to be data type timestamp.</td></tr><tr><td>uniqueidentifier</td><td>1</td><td>Operand type clash: int is incompatible with uniqueidentifier</td></tr><tr><td>xml</td><td>1</td><td>Operand type clash: int is incompatible with xml</td></tr></table>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-90537285018282467622018-02-21T11:23:00.001-06:002018-02-22T11:52:40.289-06:00Pop quiz - altering column types<h3>Pop quiz</h3><p>Given the following DDL <pre><br />CREATE TABLE dbo.IntToTime<br />(<br /> CREATE_TIME int<br />);<br /></pre> <p>What will be the result of issuing the following command? <pre>ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME time NULL;</pre> <p>Clearly, if I'm asking, it's not what you might expect. How can an empty table not allow you to change data types? Well it seems Time and datetime2 are special cases as they'll raise errors of the form <p><cite>Msg 206, Level 16, State 2, Line 47 Operand type clash: int is incompatible with time</cite> <p> <p>If you're in this situation and need to get the type converted, you'll need to make two hops, one to varchar and then to time. <pre><br />ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME varchar(10) NULL;<br />ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME time NULL;<br /></pre>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-6299162453245703022018-01-25T08:00:00.000-06:002018-01-25T16:48:38.602-06:00What are all the functions and their parameters?<h1>What are all the functions and their parameters?</h1><p>File this one under: I wrote it once, may I never need it again <p>In my ever expanding quest for getting all the metadata, I how could I determine the metadata for all my table valued functions? No problem, that's what sys.dm_exec_describe_first_result_set is for. <code>SELECT * FROM sys.dm_exec_describe_first_result_set(N'SELECT * FROM dbo.foo(@xmlMessage)', N'@xmlMessage nvarchar(max)', 1) AS DEDFRS</code> <p>Except, I need to know parameters. And I need to know parameter types. And order. Fortunately, sys.parameters and sys.types makes this easy. The only ugliness comes from the double invocation of row rollups <pre><br /><br />SELECT <br /> CONCAT<br /> (<br /> ''<br /> , 'SELECT * FROM '<br /> , QUOTENAME(S.name)<br /> , '.'<br /> , QUOTENAME(O.name)<br /> , '('<br /> -- Parameters here without type<br /> , STUFF<br /> (<br /> (<br /> SELECT <br /> CONCAT<br /> (<br /> ''<br /> , ','<br /> , P.name<br /> , ' '<br /> )<br /> FROM<br /> sys.parameters AS P<br /> WHERE<br /> P.is_output = CAST(0 AS bit)<br /> AND P.object_id = O.object_id<br /> ORDER BY<br /> P.parameter_id<br /> FOR XML PATH('')<br /> )<br /> , 1<br /> , 1<br /> , ''<br /> )<br /><br /> , ') AS F;'<br /> ) AS SourceQuery<br />, (<br /> STUFF<br /> (<br /> (<br /> SELECT <br /> CONCAT<br /> (<br /> ''<br /> , ','<br /> , P.name<br /> , ' '<br /> , CASE <br /> WHEN T2.name LIKE '%char' THEN CONCAT(T2.name, '(', CASE P.max_length WHEN -1 THEN 'max' ELSE CAST(P.max_length AS varchar(4)) END, ')')<br /> WHEN T2.name = 'time' OR T2.name ='datetime2' THEN CONCAT(T2.name, '(', P.scale, ')')<br /> WHEN T2.name = 'numeric' THEN CONCAT(T2.name, '(', P.precision, ',', P.scale, ')')<br /> ELSE T2.name<br /> END<br /> )<br /> FROM<br /> sys.parameters AS P<br /> INNER JOIN<br /> sys.types AS T2<br /> ON T2.user_type_id = P.user_type_id<br /> WHERE<br /> P.is_output = CAST(0 AS bit)<br /> AND P.object_id = O.object_id<br /> ORDER BY<br /> P.parameter_id<br /> FOR XML PATH('')<br /> )<br /> , 1<br /> , 1<br /> , ''<br /> )<br /> ) AS ParamterList<br />FROM<br /> sys.schemas AS S<br /> INNER JOIN<br /> sys.objects AS O<br /> ON O.schema_id = S.schema_id<br />WHERE<br /> O.type IN ('FT','IF', 'TF');<br /></pre> <p>How you use this is up to you. I plan on hooking it into the <a href="http://billfellows.blogspot.com/2017/09/biml-query-table-builder.html">Biml Query Table Builder</a> to simulate tables for all my TVFs. <p> Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-74813255024549253982018-01-22T08:00:00.000-06:002018-01-22T21:45:21.218-06:00Staging Metadata Framework for the Unknown<h1>Staging metadata framework for the unknown</h1><p>That's a terrible title but it's the best I got. A client would like to report out of ServiceNow some metrics not readily available in the PowerBI App. The first time I connected, I got a quick look at the Incidents and some of the data we'd be interested in but I have no idea how that data changes over time. When you first open a ticket, maybe it doesn't have a resolved date or a caused by field populated. And since this is all web service stuff and you can customize it, I knew I was looking at lots of iterations to try and keep up with all the data coming back from the service. How can I handle this and keep sane? Those were my two goals. I thought it'd be fun to share how I solved the problem using features in SQL Server 2016. <p>To begin, I created a database called RTMA to perform my real time metrics analysis. <code>CREATE DATABASE RTMA;</code> With that done, I created a schema within my database like <code>USE RTMA; GO CREATE SCHEMA ServiceNow AUTHORIZATION dbo;</code> To begin, we need a table to hold our discovery metadata. <pre><br />CREATE TABLE <br /> ServiceNow.ColumnSizing<br />(<br /> EntityName varchar(30) NOT NULL<br />, CollectionName varchar(30) NOT NULL<br />, ColumnName varchar(30) NOT NULL<br />, ColumnLength int NOT NULL<br />, InsertDate datetime NOT NULL<br /> CONSTRAINT DF_ServiceNow_ColumnSizing_InsertDate DEFAULT (GETDATE())<br />);<br /><br />CREATE CLUSTERED COLUMNSTORE INDEX<br /> CCI_ServiceNow_ColumnSizing<br /> ON ServiceNow.ColumnSizing;<br /></pre> The idea for this metadata table is that we'll just keep adding more information in for the entities we survey. All that matters is the largest length for a given combination of Entity, Collection, and Column. <p>In the following demo, we'll add 2 rows into our table. The first batch will be our initial sizing and then "something" happens and we discover the size has increased. <pre><br />INSERT INTO<br /> ServiceNow.ColumnSizing<br />(<br /> EntityName<br />, CollectionName<br />, ColumnName<br />, ColumnLength<br />, InsertDate<br />)<br />VALUES<br /> ('DoesNotExist', 'records', 'ABC', 10, current_timestamp)<br />, ('DoesNotExist', 'records', 'BCD', 30, current_timestamp);<br /></pre> <p>Create a base table for our DoesNotExist. What columns will be available? I know I'll want my InsertDate and that's the only thing I'll guarantee to begin. And that's ok because we're going to get clever. <pre><br />DECLARE @entity nvarchar(30) = N'DoesNotExist'<br />, @Template nvarchar(max) = N'DROP TABLE IF EXISTS ServiceNow.Stage<Entity/>;<br /> CREATE TABLE<br /> ServiceNow.Stage<Entity/><br /> (<br /> <Columns/><br /> InsertDate datetime CONSTRAINT DF_ServiceNow_Stage<Entity/>_InsertDate DEFAULT (GETDATE())<br /> );<br /> CREATE CLUSTERED COLUMNSTORE INDEX<br /> CCI_ServiceNow_Stage<Entity/><br /> ON<br /> ServiceNow.Stage<Entity/>;'<br />, @Columns nvarchar(max) = N'';<br /><br />DECLARE @Query nvarchar(max) = REPLACE(REPLACE(@Template, '<Entity/>', @Entity), '<Columns/>', @Columns);<br />EXECUTE sys.sp_executesql @Query, N'';<br /></pre> <p>We now have a table with one column so let's look at using our synthetic metadata (ColumnSizing) to augment it. The important thing to understand in the next block of code is that we'll use FOR XML PATH('') to concatenate rows together and the CONCAT function to concatenate values together. <p>See more <a href="https://stackoverflow.com/a/31212160/181965" target="_blank">here for the XML PATH "trick"</a> <p>If we're going to define columns for a table, it follows that we need to know what table needs what columns and what size those columns should be. So, let the following block be that definition. <pre><br />DECLARE @Entity varchar(30) = 'DoesNotExist';<br /><br />SELECT<br /> CS.EntityName<br />, CS.CollectionName<br />, CS.ColumnName<br />, MAX(CS.ColumnLength) AS ColumnLength<br />FROM<br /> ServiceNow.ColumnSizing AS CS<br />WHERE<br /> CS.ColumnLength > 0<br /> AND CS.ColumnLength = <br /> (<br /> SELECT<br /> MAX(CSI.ColumnLength) AS ColumnLength<br /> FROM<br /> ServiceNow.ColumnSizing AS CSI<br /> WHERE<br /> CSI.EntityName = CS.EntityName<br /> AND CSI.ColumnName = CS.ColumnName<br /> )<br /> AND CS.EntityName = @Entity<br />GROUP BY<br /> CS.EntityName<br />, CS.CollectionName<br />, CS.ColumnName;<br /></pre> <p>We run the above query and that looks like what we want so into the FOR XML machine it goes. <pre>DECLARE @Entity varchar(30) = 'DoesNotExist'<br />, @ColumnSizeDeclaration varchar(max);<br /><br />;WITH BASE_DATA AS<br />(<br /> -- Define the base data we'll use to drive creation<br /> SELECT<br /> CS.EntityName<br /> , CS.CollectionName<br /> , CS.ColumnName<br /> , MAX(CS.ColumnLength) AS ColumnLength<br /> FROM<br /> ServiceNow.ColumnSizing AS CS<br /> WHERE<br /> CS.ColumnLength > 0<br /> AND CS.ColumnLength = <br /> (<br /> SELECT<br /> MAX(CSI.ColumnLength) AS ColumnLength<br /> FROM<br /> ServiceNow.ColumnSizing AS CSI<br /> WHERE<br /> CSI.EntityName = CS.EntityName<br /> AND CSI.ColumnName = CS.ColumnName<br /> )<br /> AND CS.EntityName = @Entity<br /> GROUP BY<br /> CS.EntityName<br /> , CS.CollectionName<br /> , CS.ColumnName<br />)<br />SELECT DISTINCT<br /> BD.EntityName<br />, (<br /> SELECT<br /> CONCAT<br /> (<br /> ''<br /> , BDI.ColumnName<br /> , ' varchar('<br /> , BDI.ColumnLength<br /> , '),'<br /> ) <br /> FROM<br /> BASE_DATA AS BDI<br /> WHERE<br /> BDI.EntityName = BD.EntityName<br /> AND BDI.CollectionName = BD.CollectionName<br /> FOR XML PATH('')<br />) AS ColumnSizeDeclaration<br />FROM<br /> BASE_DATA AS BD;<br /></pre> <p>That looks like a lot, but it's not. Run it and you'll see we get one row with two elements: "DoesNotExist" and "ABC varchar(10),BCD varchar(30)," That trailing comma is going to be a problem, that's generally why you see people either a leading delimiter and use STUFF to remove it or in the case of a trailing delimiter LEFT with LEN -1 does the trick. <p>But we're clever and don't need such tricks. If you look at the declaration for @Template, we assume there will *always* be at final column of InsertDate which didn't have a comma preceding it. Always define the rules to favor yourself. ;) <p>Instead of the static table declaration we used, let's marry our common table expression, CTE, with the table template. <pre><br />DECLARE @entity nvarchar(30) = N'DoesNotExist'<br />, @Template nvarchar(max) = N'DROP TABLE IF EXISTS ServiceNow.Stage<Entity/>;<br /> CREATE TABLE<br /> ServiceNow.Stage<Entity/><br /> (<br /> <Columns/><br /> InsertDate datetime CONSTRAINT DF_ServiceNow_Stage<Entity/>_InsertDate DEFAULT (GETDATE())<br /> );<br /> CREATE CLUSTERED COLUMNSTORE INDEX<br /> CCI_ServiceNow_Stage<Entity/><br /> ON<br /> ServiceNow.Stage<Entity/>;'<br />, @Columns nvarchar(max) = N'';<br /><br />-- CTE logic patched in here<br /><br />;WITH BASE_DATA AS<br />(<br /> -- Define the base data we'll use to drive creation<br /> SELECT<br /> CS.EntityName<br /> , CS.CollectionName<br /> , CS.ColumnName<br /> , MAX(CS.ColumnLength) AS ColumnLength<br /> FROM<br /> ServiceNow.ColumnSizing AS CS<br /> WHERE<br /> CS.ColumnLength > 0<br /> AND CS.ColumnLength = <br /> (<br /> SELECT<br /> MAX(CSI.ColumnLength) AS ColumnLength<br /> FROM<br /> ServiceNow.ColumnSizing AS CSI<br /> WHERE<br /> CSI.EntityName = CS.EntityName<br /> AND CSI.ColumnName = CS.ColumnName<br /> )<br /> AND CS.EntityName = @Entity<br /> GROUP BY<br /> CS.EntityName<br /> , CS.CollectionName<br /> , CS.ColumnName<br />)<br />SELECT DISTINCT<br /> @Columns = (<br /> SELECT<br /> CONCAT<br /> (<br /> ''<br /> , BDI.ColumnName<br /> , ' varchar('<br /> , BDI.ColumnLength<br /> , '),'<br /> ) <br /> FROM<br /> BASE_DATA AS BDI<br /> WHERE<br /> BDI.EntityName = BD.EntityName<br /> AND BDI.CollectionName = BD.CollectionName<br /> FOR XML PATH('')<br />) <br />FROM<br /> BASE_DATA AS BD;<br /><br />DECLARE @Query nvarchar(max) = REPLACE(REPLACE(@Template, '<Entity/>', @Entity), '<Columns/>', @Columns);<br />EXECUTE sys.sp_executesql @Query, N'';<br /></pre> <p>Bam, look at it now. We took advantage of the new DROP IF EXISTS (DIE) syntax to drop our table and we've redeclared it, nice as can be. Don't take my word for it though, ask the system tables what they see. <pre><br />SELECT<br /> S.name AS SchemaName<br />, T.name AS TableName<br />, C.name AS ColumnName<br />, T2.name AS DataTypeName<br />, C.max_length<br />FROM<br /> sys.schemas AS S<br /> INNER JOIN<br /> sys.tables AS T<br /> ON T.schema_id = S.schema_id<br /> INNER JOIN<br /> sys.columns AS C<br /> ON C.object_id = T.object_id<br /> INNER JOIN<br /> sys.types AS T2<br /> ON T2.user_type_id = C.user_type_id<br />WHERE<br /> S.name = 'ServiceNow'<br /> AND T.name = 'StageDoesNotExist'<br />ORDER BY<br /> S.name<br />, T.name<br />, C.column_id;<br /></pre> Excellent, we now turn on the actual data storage process and voila, we get a value stored into our table. Simulate it with the following. <pre><br />INSERT INTO ServiceNow.StageDoesNotExist<br />(ABC, BCD) VALUES ('Important', 'Very, very important');<br /></pre> Truly, all is well and good. <p>*<em>time passes</em>* <p>Then, <strong>this</strong> happens <pre><br />WAITFOR DELAY ('00:00:03');<br /><br />INSERT INTO<br /> ServiceNow.ColumnSizing<br />(<br /> EntityName<br />, CollectionName<br />, ColumnName<br />, ColumnLength<br />, InsertDate<br />)<br />VALUES<br /> ('DoesNotExist', 'records', 'BCD', 34, current_timestamp);<br /></pre>Followed by <pre><br />INSERT INTO ServiceNow.StageDoesNotExist<br />(ABC, BCD) VALUES ('Important','Very important, yet ephemeral data');<br /></pre>To quote Dr. Beckett: Oh boy <p> Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-53214722952866002992017-12-29T13:00:00.001-06:002017-12-29T13:00:29.493-06:00Python Azure Function requestor's IP address<h1>Python Azure Function requestor's IP address</h1> <p>I'm working on an anonymous level Azure Function in python and couldn't find where they stored the IP address of the caller, if applicable. It's in the request <em>headers</em>, which makes sense but not until I spent far too much time looking in all the wrong places. A minimal reproduction would look something like <pre><br />import os<br />iptag = "REQ_HEADERS_X-FORWARDED-FOR"<br />ip = "Tag name:{} Tag value:{}".format(iptag, os.environ[iptag])<br />print(ip)<br /></pre> <p>Now, something to note is that it will return not only the IP address but the port the call came in through. Thus, I see a value of <cite></cite> instead of just the ipv4 value. <p>Knowing where to look, I can see that the heavy lifting had already been done by the most excellent <a href="https://github.com/anthonyeden/Azure-Functions-Python-HTTP-Example" target="_blank">HTTPHelper</a> but as a wise man once said: knowing is half the battle. <pre><br />import os<br />from AzureHTTPHelper import HTTPHelper<br />http = HTTPHelper()<br />#Notice the lower casing of properties here and the trimming of the type (REQ_HEADERS)<br />iptag = "x-forwarded-for"<br />ip = "Tag name:{} Tag value:{}".format(iptag, http.headers[iptag])<br />print(ip)<br /></pre> <p>Yo Joe!</p><br/>Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0tag:blogger.com,1999:blog-4583365003930185416.post-57285271851872568142017-11-09T16:06:00.000-06:002017-11-09T16:06:15.205-06:00What's my transaction isolation level<h1>What's my transaction isolation level</h1><p>That's an easy question to answer - <a href="https://stackoverflow.com/questions/1038113/how-to-find-current-transaction-level" target="_blank">StackOverflow</a> has a fine answer. <p>But, what if I use <code>sp_executesql</code> to run some dynamic sql - does it default the connection isolation level? If I change isolation level within the query, does it propagate back to the invoker? That's a great question, William. Let's find out. <pre><br />SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;<br /><br />SELECT CASE transaction_isolation_level <br />WHEN 0 THEN 'Unspecified' <br />WHEN 1 THEN 'ReadUncommitted' <br />WHEN 2 THEN 'ReadCommitted' <br />WHEN 3 THEN 'Repeatable' <br />WHEN 4 THEN 'Serializable' <br />WHEN 5 THEN 'Snapshot' END AS TRANSACTION_ISOLATION_LEVEL <br />FROM sys.dm_exec_sessions <br />where session_id = @@SPID;<br /><br />DECLARE<br /> @query nvarchar(max) = N'-- Identify iso level<br />SELECT CASE transaction_isolation_level <br />WHEN 0 THEN ''Unspecified'' <br />WHEN 1 THEN ''ReadUncommitted'' <br />WHEN 2 THEN ''ReadCommitted'' <br />WHEN 3 THEN ''Repeatable'' <br />WHEN 4 THEN ''Serializable'' <br />WHEN 5 THEN ''Snapshot'' END AS TRANSACTION_ISOLATION_LEVEL <br />FROM sys.dm_exec_sessions <br />where session_id = @@SPID;<br /><br />SET TRANSACTION ISOLATION LEVEL READ COMMITTED;<br /><br />-- Test iso level<br />SELECT CASE transaction_isolation_level <br />WHEN 0 THEN ''Unspecified'' <br />WHEN 1 THEN ''ReadUncommitted'' <br />WHEN 2 THEN ''ReadCommitted'' <br />WHEN 3 THEN ''Repeatable'' <br />WHEN 4 THEN ''Serializable'' <br />WHEN 5 THEN ''Snapshot'' END AS TRANSACTION_ISOLATION_LEVEL <br />FROM sys.dm_exec_sessions <br />where session_id = @@SPID'<br /><br />EXECUTE sys.sp_executesql @query, N'';<br /><br />SELECT CASE transaction_isolation_level <br />WHEN 0 THEN 'Unspecified' <br />WHEN 1 THEN 'ReadUncommitted' <br />WHEN 2 THEN 'ReadCommitted' <br />WHEN 3 THEN 'Repeatable' <br />WHEN 4 THEN 'Serializable' <br />WHEN 5 THEN 'Snapshot' END AS TRANSACTION_ISOLATION_LEVEL <br />FROM sys.dm_exec_sessions <br />where session_id = @@SPID;<br /></pre> <p>I begin my session in read uncommitted aka "nolock". I then run dynamic sql which identifies my isolation level, still read uncommitted, change it to a different level, confirmed at read committed, and then exit and check my final state - back to read uncommitted. <p>Finally, thanks to Andrew Kelly (<a href="http://sqlblog.com/blogs/andrew_kelly/default.aspx">b</a>|<a href="https://twitter.com/GunneyK">t</a>) for answering the <a href="https://twitter.com/GunneyK/status/928741785932791810">#sqlhelp</a> call.Bill Fellowshttp://www.blogger.com/profile/09363163555016521189noreply@blogger.com0