A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Monday, February 8, 2021

Reusing your python code

Reusing your ptyhon code

I learned python in 2003 and used it for all the ETL work I was doing. It was beautiful and I would happilly wax to any programmer friends about the language and how they should be learning it. It turns out, my advocacy was just 15+ years too early. I recently had a client reach out to engage me to work on their Databricks project. No gentle reader, I don't much of anything about Databricks. But I do know about working with data, python programming (which I was already updating my mental model to 3.0) and pandas. Yes, pandas is not what we do in databricks but the concepts are similar.

One of the early observations is that they had dozens of notebooks with copy and paste code across them. Copy and paste code in a metadata driven solution isn't an evil but when you're hand crafting boiler plate code artifacts by hand, you're going to sneak a code mutation in there. So, let's look at how we can avoid this with code re-use.

Let's assume we use an important business process that needs to be consistent across our infrastructure. In this case, it's a modification date which is used as part of our partition strategy. This code nugget is spread across all those notebooks datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ") When processing starts up, we set a timestamp so that all activities accrue under that same timestamp. It's a common pattern across data processing. How could we do this better?

In classic python programming, we would abstract that logic away in a reusable library. In this example, I have created a module (file) named reusable_code.py In it, I created a class named Configuration and it exposes a method get_modify_date

# reusable_code.py is a python module that simulates our desire to 
#consolidate our corporate business logic into a re-usable entity

from datetime import datetime

class Configuration():
    """A very important class that provides a standardized approach for our company"""
    def __init__(self):
        # The modify date drives very important business functionality
        # so let's be consitent in how it is defined (ISO 8601)
        # 2021-02-07T17:58:20Z
        self.__modify_date__ = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")

    def get_modify_date(self):
        """Provide a standard interface for accessing the modify date"""
        return self.__modify_date__


def main():
    c = Configuration()
    print(c.get_modify_date())

if __name__ == "__main__":
    main()

Usage is simple, I create an instance of my class c, which causes the constructor/initalizer to fire and set the modify date for the life of that object. Calling the get_modify_date method results in an ISO 8601 date to be emitted

2021-02-07T17:58:20Z

At this point, I hope you have an understanding of how we can make a reusable widget. Think about your business processes that you need to encapsulate in to reusable components and tomorrow we'll review using existing python modules in new files. After that, we'll cover converting this module into a wheel. And then we'll walk through installing it to a DataBricks cluster and using it from a notebook. Sound good?

All of this code is available on my github repository -> 2021-02-08_PythonReusableCode

No comments: