Using the awesome fast number generator from SQL PASS 2009, I quickly filled 4 new tables with 20M rows and compared sizes. Using SQL Server 2008 R2, my results are below
CREATE DATABASE sizeCompare GO USE sizeCompare GO CREATE FUNCTION dbo.GenerateNumbers ( @n as bigint ) RETURNS TABLE RETURN WITH L0 AS ( SELECT 0 AS C UNION ALL SELECT 0 ) , L1 AS ( SELECT 0 AS c FROM L0 AS A CROSS JOIN L0 AS B ) , L2 AS ( SELECT 0 AS c FROM L1 AS A CROSS JOIN L1 AS B ) , L3 AS ( SELECT 0 AS c FROM L2 AS A CROSS JOIN L2 AS B ) , L4 AS ( SELECT 0 AS c FROM L3 AS A CROSS JOIN L3 AS B ) , L5 AS ( SELECT 0 AS c FROM L4 AS A CROSS JOIN L4 AS B ) , NUMS AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS number FROM L5 ) SELECT top (@n) number FROM NUMS ORDER BY number GO CREATE TABLE DBO.Empties ( row_id int NOT NULL PRIMARY KEY , dumb_column varchar(100) NULL ) CREATE TABLE DBO.Nulls ( row_id int NOT NULL PRIMARY KEY , dumb_column varchar(100) NULL ) CREATE TABLE DBO.SparseEmpties ( row_id int NOT NULL PRIMARY KEY , dumb_column varchar(100) SPARSE NULL ) CREATE TABLE DBO.SparseNulls ( row_id int NOT NULL PRIMARY KEY , dumb_column varchar(100) SPARSE NULL ) ----------------------------------------------- -- 7:28 seconds for the next 2 queries ----------------------------------------------- INSERT INTO dbo.Empties ( row_id , dumb_column ) SELECT T.number, '' FROM dbo.GenerateNumbers(20000000) T INSERT INTO dbo.Nulls ( row_id , dumb_column ) SELECT T.number, NULL FROM dbo.GenerateNumbers(20000000) T GO ----------------------------------------------- -- 3:30 seconds for the next 2 queries ----------------------------------------------- INSERT INTO dbo.SparseEmpties SELECT * FROM dbo.Empties INSERT INTO dbo.SparseNulls SELECT * FROM dbo.Nulls ----------------------------------------------- -- How big is it? -- http://www.sqlservercentral.com/blogs/jeffrey_yao/archive/tags/DMV/default.aspx ----------------------------------------------- select name=object_schema_name(object_id) + '.' + object_name(object_id) , rows=sum(case when index_id < 2 then row_count else 0 end) , reserved_kb=8*sum(reserved_page_count) , data_kb=8*sum( case when index_id<2 then in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count else lob_used_page_count + row_overflow_used_page_count end ) , index_kb=8*(sum(used_page_count) - sum( case when index_id<2 then in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count else lob_used_page_count + row_overflow_used_page_count end ) ) , unused_kb=8*sum(reserved_page_count-used_page_count) from sys.dm_db_partition_stats where object_id > 1024 group by object_id order by rows desc -- data_kb desc -- reserved_kb desc -- data_kb desc -- index_kb desc -- unsed_kb desc
name | rows | reserved_kb | data_kb | index_kb | unused_kb |
---|---|---|---|---|---|
dbo.Empties | 20000000 | 258248 | 257240 | 968 | 40 |
dbo.Nulls | 20000000 | 258248 | 257240 | 968 | 40 |
dbo.SparseEmpties | 20000000 | 497288 | 495360 | 1848 | 80 |
dbo.SparseNulls | 20000000 | 258248 | 257240 | 968 | 40 |
That doesn't mean developer's get off without having to clean up what they're storing. The increased complexity in logic as well as performance impacts (index, what index?) of code rifled with NULLIF(T.dumb_column) IS NULL or COALESCE(T.dumb_column, '') <> '' is reason enough to not store empty strings in your database. That'll be a post for another day however.
No comments:
Post a Comment