A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Tuesday, November 19, 2019

Generating characters in TSQL

Generating characters in TSQL

I had to do a thing* and it involved generating "codes" as numbers were too hard for people. So, if you have need to convert an arbitrary number into characters, this is your lucky day/post.

Background

As I get longer in the tooth programming becomes more accessible, I find that people might not have been exposed to underpinnings of how things used to work. Strings were just a bunch of characters put together and a character was a subset of the Latin alphabet shoved into 128 characters (0 to 127). The characters below 32 were referred to as the non-printable characters or control characters. Things above 32 are what you see on a US keyboard. There was a time, if you bought a programming book, it would have an ASCII table somewhere in the reference. Capital A is character 65, Capital Z is character 90 (65/A + 25 characters later). In TSQL, the CHAR function takes a number and gives you the ASCII character for the value so SELECT CHAR(66) AS B; will generate a capital B.

The mod or modulus function will return the remainder after division. Modding a value is a handy way to constrain a value between 0 and an upper threshold. In this case, if I modded any number by 26 (because there are 26 characters in the English alphabet), I'll get 0 to 25 as my result.

Knowing that the modulus function will give me 0 to 25 and knowing that my target character range starts at 65, I could use the previous expression to print any number's ascii value like SELECT CHAR((2147483625 % 26) + 65) AS StillB;. Break that apart, we do the modulus, %, which gives us the value of 1 which we then add to the starting offset (65).

Rolling all that together, here's a quick little tester to see what we can then do with it.

SELECT
    D.rn
,   ASCII_ORD.ord_value
,   ASCII_ORD.replicate_count
    -- CHAR converts a number to a character
,   CHAR(ASCII_ORD.ord_value) AS ord_value_as_character
    -- REPLICATE repeats a string N times
,   REPLICATE(CHAR(ASCII_ORD.ord_value), ASCII_ORD.replicate_count) AS RepeatedCharacter
    -- CONCAT is a null and type approach for string building (requires 2012+)
,   CONCAT(CHAR(ASCII_ORD.ord_value), ASCII_ORD.replicate_count) AS ConcatenatedCharacter
FROM
(
    -- Generate 0 to N-1 rows
    SELECT TOP (300)
        ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1
    FROM
        sys.all_columns AS AC
)D(rn)
CROSS APPLY
(
    -- There are 26 characters in the English language
    -- 65 is the ASCII ordinal position of a capital A
    SELECT
        D.rn % 26 + 65
    ,   D.rn / 26 + 1
) ASCII_ORD(ord_value, replicate_count)
ORDER BY
    D.rn
;

Ultimately, it was decided that using a combination of character and digits (ConcatenatedCharacter) might be more user friendly than purely a repeated character approach. Neither of which will help you when you're in the 2 billion range like our sample input of 2147483625

Key takeaways

Don't confuse the CHAR function with the char data type. Similar but different

That's why books always had ASCII tables in them

Modulus function can generate a bounded set of numbers

Older developers might know some weird tricks/trivia

Even older developers will scoff at memorized ASCII tables in favor of EBCDIC tables

No comments: