views:

114

answers:

1

A web application contains sensitive data of the users. Neither the operator of the web application nor the hosting provider should be able to see this data. Therefore I wanted to store these data in the DB encrypted with the entrance password of the users.

dataInDB = encrypt (rawData, user password) 

With this strategy it is however not possible to implement the usual use case for password recovery: Since usually only the hash value of the password is stored by the web app, the application cannot send the old, forgotten password to the user. And with the assignment of a new, coincidental password the encrypted data in the DB are no longer readable.

Is there any other solution ?

+4  A: 

A possible solution (I am not responsible for any destruction):

When encrypting sensitive data, don't use the user's password as the key. Rather, derive the key from the user's password (preferably using a standard algorithm such as PBKDF2). Just in case the user forgets their password, you can keep a copy of this derived key (encrypted using a different key derived from the user's answer). If the user forgets their password, they can answer their security question. Only the correct answer will decrypt the original password key (not the original password). This affords you the opportunity to re-encrypt the sensitive information.

I will demonstrate using (Python-esque) pseudo code, but first let's look at a possible table for the users. Don't get caught up in the columns just yet, they will become clear soon...

CREATE TABLE USERS
(
    user_name               VARCHAR,

    -- ... lots of other, useful columns ...

    password_key_iterations NUMBER,
    password_key_salt       BINARY,
    password_key_iv         BINARY,
    encrypted_password_key  BINARY,
    question                VARCHAR,
    answer_key_iterations   NUMBER,
    answer_key_salt         BINARY
)

When it comes time to register a user, they must provide a question and answer:

def register_user(user_name, password, question, answer):
    user = User()

    # The question is simply stored for later use
    user.question = question

    # The password secret key is derived from the user's password
    user.password_key_iterations = generate_random_number(from=1000, to=2000)
    user.password_key_salt = generate_random_salt()
    password_key = derive_key(password, iterations=user.password_key_iterations, salt=user.password_key_salt)

    # The answer secret key is derived from the answer to the user's security question
    user.answer_key_iterations = generate_random_number(from=1000, to=2000)
    user.answer_key_salt = generate_random_salt()
    answer_key = derive_key(answer, iterations=user.answer_key_iterations, salt=user.answer_key_salt)

    # The password secret key is encrypted using the key derived from the answer
    user.password_key_iv = generate_random_iv()
    user.encrypted_password_key = encrypt(password_key, key=answer_key, iv=user.password_key_iv)

    database.insert_user(user)

Should the user forget their password, the system will still have to ask the user to answer their security question. Their password cannot be recovered, but the key derived from the password can be. This allows the system to re-encrypt the sensitive information using the new password:

def reset_password(user_name, answer, new_password):
    user = database.rerieve_user(user_name)

    answer_key = derive_key(answer, iterations=user.answer_key_iterations, salt=user.answer_key_salt)

    # The answer key decrypts the old password key
    old_password_key = decrypt(user.encrypted_password_key, key=answer_key, iv=user.password_key_iv)

    # TODO: Decrypt sensitive data using the old password key

    new_password_key = derive_key(new_password, iterations=user.password_key_iterations, salt=user.password_key_salt)

    # TODO: Re-encrypt sensitive data using the new password key

    user.encrypted_password_key = encrypt(new_password_key, key=user.answer_key, iv=user.password_key_iv)

    database.update_user(user)

Of course, there are some general cryptographic principles not explicitly highlighted here (cipher modes, etc...) that are the responsibility of the implementer to familiarize themselves with.

Hope this helps a little! :)

Update courtesy of Eadwacer's comment

As Eadwacer commented:

I would avoid deriving the key directly from the password (limited entropy and changing the password will require re-encrypting all of the data). Instead, create a random key for each user and use the password to encrypt the key. You would also encrypt the key using a key derived from the security questions.

Here is a modified version of my solution taking his excellent advice into consideration:

CREATE TABLE USERS
(
    user_name                      VARCHAR,

    -- ... lots of other, useful columns ...

    password_key_iterations        NUMBER,
    password_key_salt              BINARY,
    password_encrypted_data_key    BINARY,
    password_encrypted_data_key_iv BINARY,
    question                       VARCHAR,
    answer_key_iterations          NUMBER,
    answer_key_salt                BINARY,
    answer_encrypted_data_key      BINARY,
    answer_encrypted_data_key_iv   BINARY,
)

You would then register the user as follows:

def register_user(user_name, password, question, answer):
    user = User()

    # The question is simply stored for later use
    user.question = question

    # The randomly-generated data key will ultimately encrypt our sensitive data
    data_key = generate_random_key()

    # The password key is derived from the password
    user.password_key_iterations = generate_random_number(from=1000, to=2000)
    user.password_key_salt = generate_random_salt()
    password_key = derive_key(password, iterations=user.password_key_iterations, salt=user.password_key_salt)

    # The answer key is derived from the answer
    user.answer_key_iterations = generate_random_number(from=1000, to=2000)
    user.answer_key_salt = generate_random_salt()
    answer_key = derive_key(answer, iterations=user.answer_key_iterations, salt=user.answer_key_salt)

    # The data key is encrypted using the password key
    user.password_encrypted_data_key_iv = generate_random_iv()
    user.password_encrypted_data_key = encrypt(data_key, key=password_key, iv=user.password_encrypted_data_key_iv)

    # The data key is encrypted using the answer key
    user.answer_encrypted_data_key_iv = generate_random_iv()
    user.answer_encrypted_data_key = encrypt(data_key, key=answer_key, iv=user.answer_encrypted_data_key_iv)

    database.insert_user(user)

Now, resetting a user's password looks like this:

def reset_password(user_name, answer, new_password):
    user = database.rerieve_user(user_name)

    answer_key = derive_key(answer, iterations=user.answer_key_iterations, salt=user.answer_key_salt)

    # The answer key decrypts the data key
    data_key = decrypt(user.answer_encrypted_data_key, key=answer_key, iv=user.answer_encrypted_data_key_iv)

    # Instead of re-encrypting all the sensitive data, we simply re-encrypt the password key
    new_password_key = derive_key(new_password, iterations=user.password_key_iterations, salt=user.password_key_salt)

    user.password_encrypted_data_key = encrypt(data_key, key=new_password_key, iv=user.password_encrypted_data_key_iv)

    database.update_user(user)

Hopefully my head is still functioning clearly tonight...

Adam Paynter
Thank you Adam, this is exactly what I'm looking for. I'll try to implement this solution within the next days in Java using the Java Crypto API and post the result here.
Dominik
One thing isn't clear for me until now:For the normal use case of the webapp (login and see the -decrypted- user data) the user should not answer the security question. Is it right, that the password secret key, which is used for encrypt the sensitive data, has to be derived the same way as for the initial creation of that key. And in order to reproduce the same key, iterations and salt are stored within the Db too.
Dominik
@Dominik: You are correct, that's why we store the iterations and salt in the database.
Adam Paynter
@Adam: How many bits are used in the salt?
Frank Computer
@Frank: See http://stackoverflow.com/questions/184112/what-is-the-optimal-length-for-user-password-salt. They recommend 16 bytes of random data.
Adam Paynter
I would avoid deriving the key directly from the password (limited entropy and changing the password will require re-encrypting all of the data). Instead, create a random key for each user and use the password to encrypt the key. You would also encrypt the key using a key derived from the security questions.
Eadwacer
Dominik
@Dominik: Eadwacer is recommending you maintain a *third* key, we can call it the *data* key. **This** is the key that will ultimately encrypt the sensitive data. When you register a user, you randomly generate this *data* key. You then encrypt it using the *password* key, storing the result in some field of the USERS table. You then encrypt it using the *answer* key, also storing the result in some other field of the USERS table. This way, you can still obtain the *data* key via the *answer* key.
Adam Paynter
Ok, thanks a lot for this additional improvement. Whats the preferred key length for the data key ? I want to encrypt the sensitive user data using AES via the Java Crypto API. As long as I derive the key from the password/answer using PBKDF2, I set a key length of 256bit. I should use the same for the data key. The following statement should create such a data key:byte [] dataKey = new byte[256];java.security.SecureRandom.getInstance("SHA1PRNG").nextBytes(dataKey);
Dominik
@Dominik: It should be `new byte[32]` (8 bits per byte)
Adam Paynter
@Dominik: You could also use a `KeyGenerator` (`KeyGenerator gen = KeyGenerator.getInstance("AES");`), initialize it to 256 bits (`gen.init(256);`) and then generate a new key (`SecretKey key = gen.generateKey();`).
Adam Paynter