A coworker asked if I had any experience with encrypting data while in transit with SSIS. Their client specifies that PII data must be protected at rest and in flight. They enforce the data at rest through something, it's not my client so I don't know the particulars but we'll be pulling data like a social security number, SSN, out of the financial system and into the data warehouse and while that data is between places, our ETL needs to safeguard that data.
I had never had to deal with encryption but in SSIS, or any ADO.NET connection, we can encrypt the connection and everything flowing down the wire should be encrypted. Encrypting Connections to SQL Server It looks like there’s some setup involved but once it’s done, it should be a matter of simply updating the connection string from a usage perspective. Update 2013-05-09 After attending Steve Jones's talk on The Encryption Primer, he suggested using IPSec tunneling as it will be far easier to implement than setting up SSL certs.
The second option was to use the native, but new-to-me, TSQL function for encryption and decryption. The following example shows me using Encrypt/DecryptBYPASSPHRASE functions to encrypt/decrypt the names from spt_values. At this point I will also point out, that crypto stuff has never been anything I've been passionate about so do your due diligence before you go implementing something you read on the interblag.
See also: Cryptographic functions
1: DECLARE @PassphraseEnteredByUser nvarchar(128);
2: SET @PassphraseEnteredByUser = 'Bill is the greatest';
3: WITH SRC AS
4: (
5: SELECT
6: SV.name
7: , ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS FakeSK
8: FROM
9: master.dbo.spt_values AS SV
10: )
11: , ENCRYPTO AS
12: (
13: SELECT
14: S.name
15: , EncryptByPassPhrase
16: (
17: @PassphraseEnteredByUser
18: , S.name
19: , 1
20: , CONVERT( varbinary, S.FakeSK)
21: ) AS encrypted_text
22: , S.FakeSK
23: FROM
24: SRC S
25: )
26: , DECRYPTO AS
27: (
28: SELECT
29: E.*
30: , CONVERT(nvarchar, DECRYPTBYPASSPHRASE
31: (
32: @PassphraseEnteredByUser
33: , E.encrypted_text
34: , 1
35: , CONVERT( varbinary, E.FakeSK)
36: )) AS decrypted_text
37: FROM
38: ENCRYPTO E
39: )
40: SELECT
41: D.*
42: FROM
43: DECRYPTO AS D;
In the SSIS world, we could use this as our source query. The challenge around doing that we’d either need to write the encrypted value into a staging table and then decrypt in a post cleanup step (Execute SQL task) or decrypt in the data flow prior to writing to the target table. But at that point, we’d have unencrypted SSNs in memory. If we run out of memory in the DF and spill to disk, the temp files would have the unencrypted SSN in them too. The automatic, post execution step would handle the cleanup on those files but there’d be the ghost image of them if people were NSA crazy about getting the data back. Either way, that might violate the client's rules on PII security.
The other thing to note about encrypting the data is that you’re going to go from a simple 9 byte allocation per row to varbinary(8000) which will result in the data flow either writing the binary portion to disk and carrying a pointer along to that field in the in memory portion or the engine will have to allocate 8k bytes of memory per row in the buffer. Either way, you’re probably going to feel a performance impact for large datasets but we’d have to test before we can know the possible severity.
No comments:
Post a Comment