tags:

views:

362

answers:

3

I'm writing a simple program for file encryption. Mostly as an academic exercise but possibly for future serious use. All of the heavy lifting is done with third-party libraries, but putting the pieces together in a secure manner is still quite a challenge for the non-cryptographer. Basically, I've got just about everything working the way I think it should.

I'm using 128-bit AES for the encryption with a 128-bit key length. I want users to be able to enter in variable-length passwords, so I decided to hash the password with MD5 and then use the hash as the key. I figured this was acceptable--the key is always supposed to be a secret, so there's no reason to worry about collision attacks.

Now that I've implemented this, I ran across a couple articles indicating that this is a bad idea. My question is: why? If a good password is chosen, the cipher is supposed to be strong enough on its own to never reveal the key except via an extraordinary (read: currently infeasible) brute-force effort, right? Should I be using something like PBKDF2 to generate the key or is that just overkill for all but the most extreme cryptographic applications?

A: 

Well, as your post is general, let me state a few general things:

  1. MD5, SHA-0, SHA-1 are all broken hashes, and you should not use them for any cryptographic purpose, use SHA-2.

  2. You should, generally, use well-known and documented approaches to derriving keys from passwords (you don't mention what language, please say which one you are using).

  3. When doing any sort of security programming, the most important thing to do is, before you do anything, strictly document your 'threat model'. This is basically a listing of all the attacks you are trying to prevent, and how you will do it, and also what sort attacks you can't prevent against. It's quite fun to do, and you'll get to learn about all the attacks and other interesting things.

Noon Silk
1. The vulnerabilities in MD5 don't matter for my purposes because I'm only using it to derive a key out of a variable-length string. If an adversary ever gets the hash, the game is over anyway since that's what was supposed to be the secret all along. Collision attacks don't even enter the playing field here, as far as I know.2. This is what I'm trying to do by asking the question above. :) The programming language doesn't matter because this is a question of theory, not implementation. But in case you're merely curious, I'm using Python.3. This is also what I'm trying to do here. :)
Charles
*sigh* I really don't know what it will take to convince people to stop using MD5. I guess the only thing will be a compromise of systems they directly interact with. In any case, I leave it with you. I cannot stress enough how unwise it is to continue using broken algorithms, regardless of what you think you know.
Noon Silk
Dude, I don't think you're reading what I'm writing. :)
Charles
Charles, silky is right. Don't use any of the hashes he mentions. If you are asking the questions you are, you don't understand the implications of a compromised hash function (I'm not claiming I do either). Go read http://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html
lambacck
+3  A: 

This article on Key strengthening might help you. Basically you want to make the key stronger (more entropy than in a password) and make its derivation from the password reliably time consuming.

Gleb
Thanks, that's exactly what I was looking for.
Charles
A: 

The answer to your new question is: you should definitely be using something like PBKDF2 to generate the key.

I assume you are going to have a password (at least 10 chars upper lower numbers and punctuation right?) that will then generate an AES-256 key. The key will be used to encrypt/decrypt the file(s). You want to use something like PBKDF2 in order to decrease the ability for someone who gets your file to figure out your key/password through brute force attacks. Using something like PBKDF2 (and a random salt!) increases the cost of breaking the encryption on the file.

What I really recommend is that you use this as a toy and not to protect something you really care about. If you are not a security expert, you are going to make mistakes, even the experts (and lots of them together) make mistakes: http://www.sslshopper.com/article-ssl-and-tls-renegotiation-vulnerability-discovered.html

lambacck
Thanks for the tips. I did a lot more research after I posed this question and ended up implementing PBKDF2. Most of the heavy lifting is done by libraries written by individuals much smarter than myself. That aside, I need to start somewhere and what better way to ensure that my program is correct than to have it protecting (hopefully) my own data? I'm well aware of the risks of rolling my own solution here and fully accept them. If I finish the project, I hope to release it as open source software so that others can scrutinize the code, add fixes, and possibly publicly humiliate my work. :)
Charles