ansaurus

Question

Obfuscate / Mask / Scramble personal information

Answer 1

+1 A:

A very simple solution would be to ROT13 the text.

A better question may be why you feel the need to scramble the data? If you have an encryption key, you could also consider running the text through DES or AES or similar. Thos would have potential performance issues, however.

warren 2008-10-03 21:04:10

As I said, I need real names with similar/same weight as production so searches perform similarly.

Computer Chip 2008-10-03 21:18:08

Additional ROT13 doesn't actually scramble the data, since it is an easily reversible algorithm...

Guvante 2008-10-08 10:51:56

yes, it's easily reversible - but it does meet the criteria of "mask" or "obfuscate" - you at least need to recognize it's been ROT13's, and un-ROT it :)

warren 2009-11-04 03:05:03

Answer 2

+3 A:

I use generatedata. It is an open source php script which can generate all sorts of dummy data.

Peter Hoffmann 2008-10-03 21:04:41

excellent tip - thank you. [It's one of those things that I've been meaning to write for years but never had time]...

Richard Harrison 2008-10-03 21:47:10

Answer 3

A:

Why not just use some sort of Random Name Generator?

Ryan 2008-10-03 21:05:40

Answer 4

+2 A:

Frankly, I'm not sure why this is needed. Your dev/test environments should be private, behind your firewall, and not accessible from the web.

Your developers should be trusted, and you have legal recourse against them if they fail to live up to your trust.

I think the real question should be "Should I scramble the data?", and the answer is (in my mind) 'no'.

If you're sending it offsite for some reason, or you have to have your environments web-accessible, or if you're paranoid, I would implement a random switch. Rather than build a temp table, run switches between each location and a random row in the table, swapping one piece of data at a time.

The end result will be a table with all the same data, but with it randomly reorganized. It should also be faster than your temp table, I believe.

It should be simple enough to implement the Fisher-Yates Shuffle in SQL...or at least in a console app that reads the db and writes to the target.

Edit (2): Off-the cuff answer in T-SQL:

declare @name varchar(50) set @name = (SELECT lastName from person where personID = (random id number) Update person set lastname = @name WHERE personID = (person id of current row)

Wrap this in a loop, and follow the guidelines of Fisher-Yates for modifying the random value constraints, and you'll be set.

Jeff 2008-10-03 21:08:12

Computer Chip 2008-10-03 21:17:13

You could try the <a href="http://en.wikipedia.org/wiki/Fisher-Yates_shuffle">Fisher-Yates Shuffle</a> It should be simple enough to implement in SQL...or in a simple console app that reads in the db and writes to the target db.

Jeff 2008-10-03 21:21:52

http://en.wikipedia.org/wiki/Fisher-Yates_shuffleThat's the correct link, guess I have to learn more about the environment here ;)Updating my answer.

Jeff 2008-10-03 21:22:41

Answer 5

+1 A:

When doing something like that I usually write a small program that first loads a lot of names and surnames in two arrays, and then just updates the database using random name/surname from arrays. It works really fast even for very big datasets (200.000+ records)

Milan Babuškov 2008-10-03 21:08:35

Answer 6

A:

Use a temporary table instead and the query is very fast. I just ran on 60K rows in 4 seconds. I'll be using this one going forward.

DECLARE TABLE #Names (Id int IDENTITY(1,1),[Name] varchar(100))

/* Scramble the last names (randomly pick another last name) */ INSERT #Names SELECT LastName FROM Customer ORDER BY NEWID(); WITH [Customer ORDERED BY ROWID] AS (SELECT ROW_NUMBER() OVER (ORDER BY NEWID()) AS ROWID, LastName FROM Customer) UPDATE [Customer ORDERED BY ROWID] SET LastName=(SELECT [Name] FROM #Names WHERE ROWID=Id) DROP TABLE #Names

2009-04-30 15:12:16

You still could end up with a bad roll and have two...wait. NewID() makes UUIDs. I stand corrected.

Broam 2009-12-04 18:26:05

ansaurus

tags:

views:

answers:

Obfuscate / Mask / Scramble personal information

related questions