World of Whatever

A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Tuesday, November 22, 2022

GENERATE_SERIES is a persnickety little function

GENERATE_SERIES is a persnickety little function

New in SQL Server 2022 is the GENERATE_TIMESERIES table valued function. It's a handy function to generate a series of numbers. Once you understand why a number table is handy, you'll find lots of uses for it but this it not the blog post to cover it.

I wanted to generate a time series using the new hotness and step 1 is to get a list of the timeoffsets I'll add to my starting value. The following query will generate 16565 rows, which I'd then use as the second argument to a DATEADD call.

SELECT *, DATEADD(SECOND, Works.Value, DATETIME2FROMPARTS(2022,11,22,0,0,0,0,1)) FROM GENERATE_SERIES(0, 19565, 1) AS Works;

Perfect, now I have a row for each second from midnight to approximately 5.5 hours later. What if my duration need to vary because I'm going to compute these ranges for a number of different scenarios? I should make that 19565 into a variable and let's overengineer this by making it a bigint.

-- Assume the duration might be "big"
-- TODO: @duration = DATEDIFF(SECOND, Start, Stop)
DECLARE @duration bigint = 19565;
SELECT *, DATEADD(SECOND, LolWorks.Value, DATETIME2FROMPARTS(2022,11,22,0,0,0,0,1)) FROM GENERATE_SERIES(0, @duration, 1) AS LolWorks;

In the immortal words of Press Your Luck, WHAMMIE WHAMMIE WHAMMIE!

Msg 5373, Level 16, State 1, Line 4
All the input parameters should be of the same type. Supported types are tinyint, smallint, int, bigint, decimal and numeric.
Msg 206, Level 16, State 2, Line 4
Operand type clash: int is incompatible with void type
Msg 206, Level 16, State 2, Line 4
Operand type clash: bigint is incompatible with void type
Msg 206, Level 16, State 2, Line 4
Operand type clash: int is incompatible with void type

The fix is easy - just as the error message says, all THREE operands to GENERATE_SERIES must be the same data type. In this case, it required upsizing to bigint. In the technical writers defense (defence), they call it out in the documentation but I grew up only reading documentation after I failed to natively grasp how to make something work. The data type for stop must match the data type for start. SELECT * FROM GENERATE_SERIES(CAST(1 AS bigint), @duration, CAST(1 AS bigint)) AS Fails;

Key take away

Unlike any other situation I can think of off the top of my head, GENERATE_SERIES won't automatically promote/upsize/expand data types.

Thursday, January 27, 2022

ADF and MySql.Data.MySqlClient.MySqlException,Message=Got a packet bigger than 'max_allowed_packet'

Azure Data Factory, ADF, and exception MySql.Data.MySqlClient.MySqlException,Message=Got a packet bigger than 'max_allowed_packet'

My StackOverflow developer profile specifies "I'd prefer to not work with" and honestly, the only thing I don't want to deal with is MySQL. I don't like Visual Basic or Access or plenty of other things but good grief, I find working MySQL to be an absolute cesspit after every other RDBMS I've worked with. Which brings me to an overdue client project, consolidating various MySQL instances to a single reporting server. They have a standard schema on all the boxes (no really, that was my biggest fear but they're good at ensuring the nearly 200 sites have the exact same point release of code) and I needed to bring it to a single server so it can be fed into reports.

It seemed like a great fit for Azure Data Factory but I kept getting an error dealing with some packet size issue. What do I know about packet sizes? Error details Error code 2200 Failure type User configuration issue Details 'Type=MySql.Data.MySqlClient.MySqlException,Message=Got a packet bigger than 'max_allowed_packet' bytes,Source=MySqlConnector,''Type=MySql.Data.MySqlClient.MySqlException,Message=Got a packet bigger than 'max_allowed_packet' bytes,Source=MySqlConnector,'

What's the internet got to say about all this? I checked the setting on a server that worked and one that didn't SHOW VARIABLES LIKE 'max_allowed_packet'; but they both listed 4194304 (bytes).

Beyond changing configuration settings, and no guarantee that solves the issue, the idea of inconsistent table definition sounded promising. But no dice. I tried making all the fields nullable, but to no avail. I ran the mysqldump utility from the commandline to see if I could reproduce the packet issue. Nothing.

After a lot of frustration, I looked hard at the custom integration logs. Before I move data, I copy over the source information_schema.tables for the database and store the TABLE_ROWS for each table. That number is a approximately the number of rows in the table. In the copy activity itself, I log the actual rows transferred and that's when I noticed something. The largest set of data from a single source was 6k rows. ALL OF THE HOSTS THAT GENERATED THE MAX PACKET EXCEPTION HAD MORE ROWS THAN 6K.

Well, what if the issue is one of volume? That's easy enough to test. I put a LIMIT 1000; on the query and pointed ADF at the server that never transferred data for that table. It worked. Sonofa. Ok, LIMIT 5000; Worked. Removed the limit - Failed, got a packet bigger than 'max_allowed_packet'

The error is not being generated from the source as I assumed. Normally, ADF says whether it's the source or the sink that caused the error. The exception makes sense if the error is on the sink. "You're sending too much data in one shot" would be a more useful error.

How do we fix it

The default Write Batch Size for a Copy Activity is 10,000. I dropped the size to 5000 and ran through all the troublesome hosts. Of the 51 hosts that would never transfer the suspect table,

Thursday, January 6, 2022

SSIS Azure Feature Pack and the Flexible File components

SSIS Azure Feature Pack and the Flexible File components

The Azure Feature Pack for SSIS is something I had not worked with before today. I have a client that wants to use the Flexible File Task/Flexible File Source/Flexible File Destination but they were having issues. The Flexible File tools allow you to work with Azure Blob storage. We were dealing with ADLS Gen2 but the feature pack can work with classic blob storage as well. In my hubris, I said no problem, I know SSIS. Dear reader, I did not know as much as I thought I did...

Our scenario was simple. We had a root folder datalake and subfolders of Raw. And into that we were needed to land and then consume files. Easy peasy. The Flexible File Destination allows us to write. The Flexible File Source allows us to read and we can configure a Foreach File enumerator to use the "Foreach Data Lake Storage Gen2 File Enumerator" to interact with the file system. Everything is the same as Windows except we use forward slashes instead of backslashes for path separators.

I started with the Flexible File Source after I manually created a CSV and uploaded it to data lake. One of the things I wasn't sure about was path syntax - do I need to specify the leading slash, do I need to specify the trailing slash, etc. I think I determined it didn't matter, it'd figure out whether folder path needs a trailing slash if it wasn't specified.

As I was testing things, changing that value and the value in the Flat File Destination each time was going to be annoying so I wanted to see what my options were for using Expresssions. Expressions are the secret sauce to making your SSIS packages graceful. The development teams took the same approach they have with the ADO.NET components in that there are no expressions on the components themselves. Instead, if you want to make the Flexible File Source/Flexible File Destination dynamic at all, you're going to have to look at the Data Flow Tasks Expressions and then configure there.

Here I used an SSIS package variable @[User::ADLSPath] so I could easily switch out /datalake, datalake, datalake/raw, datalake/Raw/, /datalake/raw/, and evaluate case sensitivity, path manipulation, etc.


Transfer data error : System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.Azure.Storage.Common, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified. File name: 'Microsoft.Azure.Storage.Common, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35'

I tried it again, same issue. I flipped from 64 to 32 bit (I had installed both sets of the Feature Pack). I reread the install requirements document. I looked through github bugs. I contemplated asking a question on StackOverflow. I ended up trying to reproduce the error on my desktop instead of the client's VM. Same bloody issue! Finally, I said to heck with it, maybe it more strongly worded, and I'll get this assembly and install to the GAC. Maybe something went wonky with the installs on both machines.

How do I install something into the global assembly cache?

StackOverflow answer On your file system, you might have a utility called "gacutil.exe" From a administrative command prompt, I'd type "CD \" and then "dir /s /b gacutil.exe" That should provide a list of all the instances of gacutil on your machine. Pick one, I don't care which. If there's a space in the path name, then you'll need to wrap it with double quotes. "C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.8 Tools\x64\gacutil.exe" /? That would bring up the help text for using the gacutil that lives at that path. If the double quotes were not there, you'd get an error message stating
'C:\Program' is not recognized as an internal or external command, operable program or batch file.

How do I get Microsoft.Azure.Storage.Common.dll?

I hope you comment below and tell me an easier way because I am not a nuget master. Visual Studio has a nuget package manager in it. However, it only works in the contet of a solution or project and the project type must support packages. If you have a single project in your solution and that project is an SSIS project, attempting to use the nuget package manager will result in an error
Operation Failed. No projects supported by NuGet in the solution. Well fiddlesticks, I guess we have to give up.

Or, add a script task to the SSIS package and then click Edit Script. In the new instance of Visual Studio, guess what - it thinks it supports NuGet. It does not, when you close the editor, the packages go away and when the script runs, it does not have the brains to get the assemblies back from NuGet land. So, don't.close.the.editor. yet. In the NuGet Package manager, on the Browse tab, type in Microsoft.Azure.Storage.Common and look at that - Deprecated. Last stable version 11.2.3. But this error indicates the component expects so in the Version, scroll all the way back and find it. Click the check box and click Install button. Yes, you agree to the other packages as well as the MIT license.

At this point, in the volatile/temporary file storage on your computer, you have an on disk representation of your empty script Task with a NuGet package reference. We need to find that location, e.g. C:\Users\bfellows\AppData\Local\Temp\Vsta\SSIS_ST150\VstacIlBNvWB__0KemyB8zd1UMw\ Copy the desired DLLs out into a safe spot (because if I have to do it here, I'll likely have to do it on the server and the other three development VMs) and then use the gacutil to install them.

Right click on Solution VstaProjects and choose Open in Terminal. The assembly we're looking for will be located at .\packages\Azure.Storage.Common.12.9.0\lib\netstandard2.0\Azure.Storage.Common.dll. Assume I copied it to C:\temp

"C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.8 Tools\x64\gacutil.exe" -if C:\temp\Azure.Storage.Common.dll will force install the dll to the GAC. Now when I run SSIS, we'll see whether that has resolved our error.

Will the real error please stand up

It did resolve our error, but not. The error that was reported, missing assembly was Mickey Mouse, mate. Spurious. Not genuine.

Look what error decided to show up now that it could error out "better"
Transfer data error : Microsoft.DataTransfer.Common.Shared.HybridDeliveryException: ADLS Gen2 operation failed for: Operation returned an invalid status code 'BadRequest'. Account: 'datalakedev'. FileSystem: 'datalake'. ErrorCode: 'TlsVersionNotPermitted'. Message: 'The TLS version of the connection is not permitted on this storage account.'

If I go to my storage account, under Settings, Configuration, there I can change Minimum TLS version from 1.2 to 1.1. Oh, except that still isn't kosher - same error. 1.0 it is and lo and behold, I have data transfer. The root cause is not a missing assembly, it is a red herring error message that could only be resolved by adding the assembly to the global assembly cache.


How in the hell would a normal person make the connection between "Could not load file or assembly" and Oh, I need to change the TLS? What's really galling is the fact that when I used the Flexible File Source for my data flow, I specified a file on blob storage and SSIS was able to connect and read that file because it identified the associated metadata. I has two columns and here are the data types (defaulted to string but who cares, that's consistent with flat file source). BUT IT PICKED UP THE METADATA. IT COULD TALK TO AZURE BLOB STORAGE EVEN THOUGH IT ONLY ALLOWED 1.2! And yet, when it came time to run, it could not talk on the same channel. Can you see how I lost a large portion of my day trying to decipher this foolishness?

By the way, knowing that the root cause it a mismatch between the TLS SSIS/my computer is using and the default on the storage account, let's go back to the writeup for the Azure Feature Pack

Use TLS 1.2 The TLS version used by Azure Feature Pack follows system .NET Framework settings. To use TLS 1.2, add a REG_DWORD value named SchUseStrongCrypto with data 1 under the following two registry keys. HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\.NETFramework\v4.0.30319 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v4.0.30319

So, sometimes an error message isn't the real error message but you have to clear away all the garbage between you and the source of the error to figure out what it really is.

Wednesday, December 22, 2021

Extracting queries from SSIS packages

Extracting queries from SSIS packages

We received an abomination of an SSIS package from a third party. It was a way to write a package that I don't think I would have suggested.

Our job was to rewrite this into something more manageable and it appears Azure Data Factory will be the winner. Before we can do that, we need to document what the existing package is doing (the vendor has supplied the incremental load logic) so we can replicate it but in a more economical form. It appears to have the pattern (squint really hard at the picture) Execute SQL Task -> Execute SQL Task -> Sequence container => many data flows -> Data Flow -> Execute SQL Task. The Data Flow Task is named after the table being loaded. An ODBC source with a expression based query, named "ODBC Source 1" wired to an OLE DB Destination, named "OLE DB Destination". How would you do it, especially given that there are 236 Data Flow Tasks embedded in a single container?

Biml it!

As with so many things SSIS-related, Biml is the answer. Install BimlExpress and reverse engineer that dtsx package into Biml. I'll add a blank BimlScript file that I called Inspector.biml

Let's look at a sample DataFlow task

            <Dataflow Name="ATableName">
                <Expression ExternalProperty="[ODBC Source 1].[SqlCommand]">"SELECT * FROM dbo.ATableName where modifiedutc > '" +(DT_WSTR, 30)@[User::LastModified] + "'  AND modifiedutc <= '" + (DT_WSTR, 30)@[User::MostRecent] + "'"</Expression>
                <OdbcSource Name="ODBC Source 1" Connection="Source2">
                  <DirectInput>SELECT * FROM dbo.ATableName where modifiedutc > '0'  AND modifiedutc <= '0'</DirectInput>
                <DataConversion Name="Data Conversion">
                    <Column SourceColumn="id" TargetColumn="Copy of id" DataType="AnsiString" Length="255" CodePage="1252" />
                <OleDbCommand Name="Delete Old Rows from ATableName" ConnectionName="Destination2">
                    <Parameter SourceColumn="ye_id" TargetColumn="Param_0" DataType="AnsiStringFixedLength" Length="255" CodePage="1252" />
                    <Parameter SourceColumn="modifiedutc" TargetColumn="Param_1" DataType="Int64" />
                  <DirectInput>delete from dbo.ATableName where ye_id = ? and modifiedutc  < ?</DirectInput>
                <ConditionalSplit Name="Conditional Split">
                    <OutputPath Name="Case 1">
                <OleDbDestination Name="OLE DB Destination" ConnectionName="Destination2">
                  <ExternalTableOutput Table=""dbob"."ATableName"" />

All I want to do is find all the Data Flow Tasks in the sequence containers. I need to generate a key value pair of TableName and the source query. I could dive into the Transformations layer and find the ODBC source and extract the DirectInput node from the OdbcSource and then parse the table name from the OleDbDestination's ExternalTableOutput but look, I can "cheat" here. Everything I need is at the outer Data Flow Task level. The Name of the DataFlow is my table name and since it's ODBC and the source component doesn't support a direct Expression on it, it's defined at the Data Flow Level. That makes this Biml easy.


// A dictionary of TableName and the Expression
Dictionary<string, string> incremental = new Dictionary<string, string>();

foreach (AstPackageNode p in this.RootNode.Packages)
    // Loop through the Sequence Container
    foreach (var c in p.Tasks.OfType<AstContainerTaskNode>()/**/)
        foreach (AstDataflowTaskNode t in c.Tasks.OfType<AstDataflowTaskNode>())
            if (p.Name == "Postgres_to_MSSQL_Incremental")
                incremental[t.Name] = t.Expressions[0].Expression;

foreach (KeyValuePair<string, string> k in incremental)
    // WriteLine("<!-- {0}: {1} -->", k.Key, k.Value);
    WriteLine("{0}|{1}", k.Key, k.Value.Replace("\n", " "));

<Biml xmlns="">

I define a dictionary that will hold the Table and the associated Expression. The first foreach loop specifies that I want to enumerate through all the packages that have been reverse engineered. If you're using BimlExpress, you'll need to shift click on all the source Biml files as well as the Inspector package.

The next foreach enumerator looks at all the elements in the Control Flow task and we're going to filter it to just things that are of type AstContainerTaskNode, aka a Container. That "OfType" filter syntax is very handy to focus on only the type of item you're looking for. Linq is so powerful, I love it.

The innermost foreach enumerator uses the OfType again to filter tasks to only those that are DataFlows, AstDataFlowTaskNode. The if statement ensures I'm only working on the Incremental package (they supplied an initial load as well). Finally, I add the Task's Name to the key of the dictionary and the value becomes the first Expression. Again, I can cheat here because the vendor package was very consistent, which is amazing for an SSIS package that's been hand crafted. That source package was 40.5 MB and had been saved 422 times according to the VersionBuild number. Kudos to them for quality control.

Once the looping is complete, all that is left is to emit the information so I can use it elsewhere. Thus the final foreach loop. I'm working in BimlStudio so comments are emitted and I'm going to take advantage of that by simply writing the key/value pair with a Pipe delimiter and then copy/paste the output into a CSV. If you're working in BimlExpress, I'd just write directly to a file with a System.IO.Text.WriteAllLines (name approximate) but this was just a "quick and dirty get it done" task and corresponding blog post to show that Biml and metadata programming are still relevant.

Eagle eyed viewers will note that I am missing the single DataFlow task after the Container. My partner also notice it and so if you need to also look for any data flow tasks at the Package level, I added this loop after the Seqence Container loop.

    foreach (AstDataflowTaskNode t in p.Tasks.OfType<AstDataflowTaskNode>())
        if (p.Name == "Postgres_to_MSSQL_Incremental")
            incremental[t.Name] = t.Expressions[0].Expression;


I'm a slow/infrequent blogger and this post took me 50 minutes. I think after we were done scratching our heads at the source packages, it took less time to write the script and generate the list of tables and associated queries than this blog post took.

Wednesday, November 17, 2021

ETL pattern for API source

ETL pattern for API source

The direction for software as a service providers is to provide APIs to access their data instead of structured file exports. Which is a pity, as every SaaS system requires a bespoke data extract solution. I inheireted a solution that had an adverse pattern I'd like to talk about.

The solution pulls data from advertising and social media sites (Google Analytics, Twitter, Facebook, etc) and does processing to make it ready for reporting. The approach here works, but there are some challenges that you can run into.
  • Metering - Providers generally restrict you to only consuming so much over time (where so much and time are highly dependent on the source). Google Analytics, depending on product, rejects your connections after so many calls. Twitter, also depending on their maddening, inconsistent set of APIs (v1 vs v2), endpoints, product (free standard, paid for premium or enterprise) will throttle you based on consumption
  • Data availability - you have no idea whether the data you pulled today will be available tomorrow. We had pulled 5 years of data out of Google Analytics that contained a variety of dimensions, two of which were ga:userAgeBracket and ga:userGender. In talking to our client, they wanted just one more data elemented added to the mix. We made the change and boom goes the dynamite: Some data in this report may have been removed when a threshold was applied. That error message means that you're requesting a level of granularity that could de-anonymize users. Ok, fine, we rolled back the change but No, that's no longer a valid combination, ever! And we ran into a situation were some of the data just wasn't availble pre-2020. Yes, a month earlier the same code had pulled 6 years worth of data but no more.
  • Oops - Something happened when we created the data for reporting (data merge introduced duplicates, client wanted a differen format, etc) and now we need to do it again, except instead of the month allocated, we have a week to fix all this up. Which bumps into the Metering and Data Availability points. Ouch town, population you.

Preferred pattern

What I inheireted wasn't bad, it just hadn't taken those possible pain points into consideration. In a classic data warehouse, you have a raw zone with immutable source sitting somewhere on cheap storage. The same lesson applies here. When you pull from an API, land that data to disk in some self defining format, json/xml/csv don't care.

Write your process so that it is able to consume that source file and get the exact same results as the source data pull.

def get_data(source_date):
  """Get a data for a given date.
  :param source_date: An ISO 8601 formatted date aka yyy-MM-dd
  :return: A dictionary of data
  source_file = '/dbfs/mnt/datalake/raw/entity/data_provider_{0}.json'.format(source_date)
  raw_data = {}
  if os.path.exists(source_file):
    with open(source_file, 'r', encoding='utf-8') as f:
      raw_data = json.load(f)
      raw_data = analytics.reports().batchGet(body='json-goes-here').execute()
      with open(google_file, 'w', encoding='utf-8') as f:
        json.dump(raw_data, f, ensure_ascii=False)
  return raw_data

This simple method is responsible for getting my data by date. If the file exists in my file system, then I will reuse the results of a previous run to satisfy the data request. Otherwise, we make a call to the API and before we finish, we write the results to disk so that we can be ready in case Ooops happens downstream of the call.

Using this approach, we were able to reprocess a years worth of cached data in about 10 minutes compared to about 4.5 hours of data trickling out of the source API.

Tuesday, February 9, 2021

Including a local python module

Including a local python module

As we saw in reusing your python code, you can create a python file, a module, that contains our core business logic and then re-use that file. This post is going to talk about to make that happen.

What happens when you import antigravity? The python interpreter is going to check all the places it knows to find modules. It's going to check the install location for a module, it's going to check if you defined pythonhome/pythonpath environment variables, and you can hint to your hearts desire where to find files. If I import a real module, import pprint, I can access the __file__ property which will tell you where it found the module. In my case it was C:\Python38\lib\

Python is happy to tell you what the current search path is import sys print(sys.path)

On my machine, that displays an array of ['', 'C:\\Python38\\', 'C:\\Python38\\DLLs', 'C:\\Python38\\lib', 'C:\\Python38', 'C:\\Python38\\lib\\site-packages'] If I want to be callable by another module, then it needs to exist in one of those locations. That first entry of a blank path is the current directory so as long as the module I need is in the same folder, we're golden! I have added to the same folder as the above

# This module lives in the same folder as our file
import reusable_code

def main():
    c = reusable_code.Configuration()

if __name__ == '__main__':

Executing that, I get the expected timestamp - the exact same experience as running our corporate file but now I can focus on using the business logic instead of writing it. We're going to need something more though if we're going to get this reuable code into our pyspark cluster.

In the next post, we'll learn how to package our business module up into something we can install instead of just assuming where the file is.

As always, the code is on my github repro

Monday, February 8, 2021

Reusing your python code

Reusing your ptyhon code

I learned python in 2003 and used it for all the ETL work I was doing. It was beautiful and I would happilly wax to any programmer friends about the language and how they should be learning it. It turns out, my advocacy was just 15+ years too early. I recently had a client reach out to engage me to work on their Databricks project. No gentle reader, I don't much of anything about Databricks. But I do know about working with data, python programming (which I was already updating my mental model to 3.0) and pandas. Yes, pandas is not what we do in databricks but the concepts are similar.

One of the early observations is that they had dozens of notebooks with copy and paste code across them. Copy and paste code in a metadata driven solution isn't an evil but when you're hand crafting boiler plate code artifacts by hand, you're going to sneak a code mutation in there. So, let's look at how we can avoid this with code re-use.

Let's assume we use an important business process that needs to be consistent across our infrastructure. In this case, it's a modification date which is used as part of our partition strategy. This code nugget is spread across all those notebooks"%Y-%m-%dT%H:%M:%SZ") When processing starts up, we set a timestamp so that all activities accrue under that same timestamp. It's a common pattern across data processing. How could we do this better?

In classic python programming, we would abstract that logic away in a reusable library. In this example, I have created a module (file) named In it, I created a class named Configuration and it exposes a method get_modify_date

# is a python module that simulates our desire to 
#consolidate our corporate business logic into a re-usable entity

from datetime import datetime

class Configuration():
    """A very important class that provides a standardized approach for our company"""
    def __init__(self):
        # The modify date drives very important business functionality
        # so let's be consitent in how it is defined (ISO 8601)
        # 2021-02-07T17:58:20Z
        self.__modify_date__ ="%Y-%m-%dT%H:%M:%SZ")

    def get_modify_date(self):
        """Provide a standard interface for accessing the modify date"""
        return self.__modify_date__

def main():
    c = Configuration()

if __name__ == "__main__":

Usage is simple, I create an instance of my class c, which causes the constructor/initalizer to fire and set the modify date for the life of that object. Calling the get_modify_date method results in an ISO 8601 date to be emitted


At this point, I hope you have an understanding of how we can make a reusable widget. Think about your business processes that you need to encapsulate in to reusable components and tomorrow we'll review using existing python modules in new files. After that, we'll cover converting this module into a wheel. And then we'll walk through installing it to a DataBricks cluster and using it from a notebook. Sound good?

All of this code is available on my github repository -> 2021-02-08_PythonReusableCode