views:

106

answers:

3

I have a table called,tblClient with an encrypted column called SSN.

Due to company policy, we encrypted SSN using a symmetric key (chosen over asymmetric key due to performance reasons) using a password.

Here is a partial LIKE search on SSN declare @SSN varchar(11) set @SSN = '111-22-%'

open symmetric key SSN_KEY decrypt by password = 'secret'

    select  Client_ID
    from    tblClient (nolock)
    where   convert(nvarchar(11), DECRYPTBYKEY(SSN)) like @SSN

close symmetric key SSN_KEY

Before encryption, searching thru 150,000 records took less than 1 second.
but with the mix of decryption, the same search takes around 5 seconds.

What strategy can I apply to try to optimize searching thru encrypted column?

+1  A: 

My guess is that by encrypting the column, you are forcing a full table scan each time you query it (although check the plan to be sure). Creating an index over SSN would make the encryption pointless.

geofftnz
SSN is in varbinary format thus cannot be indexed.
Sung Meister
+3  A: 

One problem that is going to be hard to overcome is using wildcard searching requires an index or table scan of some sort which requires unencrypting every row.

Optimize instead by pre-encrypting the search values(s) to allow indexing of the encrypted values.

If you were to require explicit matching you could do something like this, note the encryption is done on the input value, not the column:

select  Client_ID 
from    tblClient (nolock) 
where   SSN = convert(nvarchar(11), ENCRYPTBYKEY(@SSN)) 

However... for a search, you may want to look into an optimization that sort of achieves this by placing the segments of the SSN into separate indexed fields, then parsing the input string, and doing

select  Client_ID 
from    tblClient (nolock) 
where   SSNFIRST3 = convert(nvarchar(3), ENCRYPTBYKEY( <parsed prefix here> )) 
and SSNSECOND2 = convert(nvarchar(2), ENCRYPTBYKEY( <parsed middle section here> )) 

You are only doing the encryption/decryption on the input values, not the rows.

The assumption is you write a bit of simple regex code to parse out the search string into separate pieces to feed the above query. The affect of the above is at least you can utilize index searches which should be drastically faster than what you have now because of the limited rows visited.

EDIT: I meant ENCRYPTBYKEY, changed above.

mrjoltcola
Let me try this approach.
Sung Meister
Comparisong using encrypted key did not work. For some reason, Encrypted column values and encrypted input value did not match. (As i read oleksiy.t's answer (http://stackoverflow.com/questions/2496398/optimizing-encrypted-column-search/2496490#2496490), it turns out that `DecryptByKey` is a non-deterministic (returns different value every time) therefore encrypting input SSN would not work and did not work.
Sung Meister
+3  A: 

The simple solution is to add an unencrypted column for first characters of SSN. And this is the hard one.

I read the article and added a new column containing hash of orignal value and the search is done through the hash column. Thanks oleksy.t
Sung Meister
+1 I don't like the simple solution. But the linked article is a really nice writeup of the pros and cons of different solutions. E.g. adding a hash for indexing throws away some of the advantage of using a non-deterministic encryption.
Accipitridae