views:

500

answers:

4

We have a web app we'd like to demo to prospects, but our best way of doing so is with existing data, for a full experience. Certainly, we don't want to do this with actual customer names or addresses, etc visible in the app. Is there an easy way in SQL Server to randomize or scramble a varchar or text field?

None of these columns are keys in anyway, either primary or foreign.

+2  A: 

Redgate has tool for it: http://www.red-gate.com/products/SQL%5FData%5FGenerator/index.htm

Didn't use it, but redgate tools are very good.

EDIT

It generates data, not scrambles, but still can be useful.

LukLed
This would be my call. The red gate tool creates random but sensible-looking data which is pretty ideal for the demo scenario you describe
Joel Mansford
A: 

I scrambled data once by changing letters within the field. So, if you have a name "Mike Smith" and you change all the i's to o's, the m's to l's, the e's to a's, the s's to t's, the t's to rr's, you end up with

Moke Smoth
Loke Sloth
Loka Sloth
Loka Tloth
Loka Rrlorrh

which was enough to make the name unreadable, and also you can't go back and determine what it was (I changed some letters which had already had letters changed to them.) But, it's still kind of readable.

thursdaysgeek
Huh. Now I know how Lovecraft got all these weird names."Cthulhu" probably started out as his hated neighbour's first name...
Christian Severin
A: 

You can create a list of the columns that need updating and then simply iterate over said list and execute some dynamic sql that will update the row in some fashion. I made a fairly basic scramble function that will just sha1 the data (with a random salt) so that it should be secure enough for most purposes.

if exists (select 1 where object_id('tempdb..#columnsToUpdate') is not null)
begin
    drop table #columnsToUpdate
end
create table #columnsToUpdate(tableName varchar(max), columnName varchar(max), max_length int)

if exists (select 1 where object_id('fnGetSanitizedName') is not null)
begin 
    drop function fnGetSanitizedName
end

if exists (select 1 where object_id('random') is not null)
begin 
    drop view random
end

if exists (select 1 where object_id('randUniform') is not null)
begin 
    drop function randUniform
end

GO

create view random(value) as select rand();
go

create function dbo.randUniform() returns real
begin
    declare @v real
    set @v = (select value from random)
    return @v
end

go

CREATE FUNCTION dbo.fnGetSanitizedName 
(
    @functionName nvarchar(max),
    @length int
)
RETURNS varchar(max)
AS
BEGIN
    return left(SUBSTRING(master.dbo.fn_varbintohexstr(HashBytes('SHA1', cast(cast(cast(dbo.randUniform() * 10000 as int) as varchar(8)) as varchar(40)) + @functionName)), 3, 32), @length)

END
GO

begin transaction
set nocount on

insert into #columnsToUpdate
select tables.name, columns.name, 
    case
     when types.name = 'nvarchar' then columns.max_length / 2
     else columns.max_length
    end as max_length
from sys.tables tables
inner join sys.columns columns on tables.object_id=columns.object_id
inner join sys.types types on columns.system_type_id = types.system_type_id
where types.name in ('nvarchar', 'varchar')


declare @tableName varchar(max)
declare @columnName varchar(max)
declare @length int
declare @executingSql varchar(max)

declare tableUpdateCursor cursor 
    for select tableName, columnName, max_length from #columnsToUpdate
open tableUpdateCursor

    fetch next from tableUpdateCursor into @tableName, @columnName, @length
    while @@fetch_status = 0
    begin
     set @executingSql = 'update ' + @tableName + ' set ' + @columnName + ' = dbo.fnGetSanitizedName(' + @columnName + ',' + cast(@length as varchar(max)) + ')'
     print @executingSql
     exec(@executingSql)

     fetch next from tableUpdateCursor into @tableName, @columnName, @length
    end


close tableUpdateCursor
deallocate tableUpdateCursor

set nocount off

rollback -- Can remove the rollback when you are sure about what your are doing.

drop table #columnsToUpdate
drop function dbo.fnGetSanitizedName
drop view random
drop function randUniform
JDonaldson
A: 

It's not possible to just leave your data in the tables and somehow only display it in a scrambled form.

Your options are to either replace the data by scrambling it in some way, generate new data that's in the same general form, write a function (CLR or T-SQL) that scrambles it as part of the queries you use, or encrypt the data, in which case it can only be displayed if the user also has the appropriate decryption key.

If you decide to replace the data, in addition to the Red Gate tool previously mentioned, you might also consider using the data generator that comes with Visual Studio Team Database, or perhaps Integration Services. The latter can be particularly useful if you would benefit from a more complex transformation.

RickNZ