views:

118

answers:

2

This is a two part question:

Part 1

First, dealing with calculating the entropy of a password in PHP. I have been unable to find any code examples that are empirically sound and would really like some help in finding the 'right' way to calculate a final number. A lot of folks on the net have their own home-baked weighting algorithm, but I am really looking for the scientific answer to the equation.

I will be using the password entropy as just one part of a larger security system and as a way to analyze our overall data security based on information accessible if a user's password is compromised and how easily a password may be broken by brute force.

Part 2

The second part of this question is: how useful will this number really be? My end goal is to generate a 'score' for each password in the system that we can use to monitor our overall system security as a dynamic entity. I will probably have to work in another algorithm or two for dictionary attacks, l33t replacement passwords, etc--but I do feel that entropy will play an important role in such an 'overall' system rating. I do welcome suggestions for other approaches though.

What I Know

I have seen some mention of logarithmic equations to calculate said entropy, but I have yet to see a good example that isn't actually written as a mathematical equation. I could really use a code example (even if not strictly in PHP) to get me going.

Extension

In making a comment I realized that I can better explain the usefulness of this calculation. When I am working on legacy systems where users have extremely weak passwords I have to have some concrete evidence of that weakness before I can make a case for forcing all users to change their passwords to a new (enforced) strong password. By storing a password strength score for each user account in the system I can build several different metrics to show overall system weakness and make a case for stronger passwords.

TIA

+2  A: 

Entropy of a string has a formal definition specified here: http://en.wikipedia.org/wiki/Entropy_(information_theory)

How useful that value is going to be? It depends. Here's a method (in Java) to calculate entropy I made for an assignment:

public static double entropy() {
   double h = 0, p;
   for (int i = 0; i < count.size(); i++){
      p = count.get(i)/(totalChars*1.0);
      h -= p*Math.log(p)/Math.log(2);
   }
   return h;
}

count is a Map where (key, value) corresponds to (char, countForChar). This obviously means you have to process the string before you call this method.

EDIT 2: Here's the same method, rewritten in PHP

function entropy($string) {
   $h=0;
   $size = strlen($string);
   foreach (count_chars($string, 1) as $v) {
      $p = $v/$size;
      $h -= $p*log($p)/log(2);
   }
   return $h;
}

EDIT 3: There's a lot more to password strength than entropy. Entropy is about uncertainty; which doesn't necessarily translate to more security. For example:

Entropy of "akj@!0aj" is 2.5, while the entropy of "password" is 2.75

quantumSoup
@Aircule - Thanks for answering, but I am aware of the definition of entropy, I am more interested in it's application with password security and how to accomplish that in PHP. For example, I probably don't want to run a thermodynamic entropy algorithm against passwords. LOL
angryCodeMonkey
@Shane - I know. See my edit.
quantumSoup
@Aircule - thanks for the update--this will help out a lot I think. On your note about security, you are absolutely correct that is why I mentioned using this as part of a larger system and also doing dictionary checks and such. For this part of it though I believe it may work.
angryCodeMonkey
+1  A: 

Forcing a certain level of entropy is a requirement of CWE-521.

(1) Minimum and maximum length;
(2) Require mixed character sets (alpha,numeric, special, mixed case);
(3) Do not contain user name;
(4) Expiration;
(5) No password reuse.

Rook
@Rook - I was actually kind of hoping you would swing by--you were a big commenter on another question I had about storing plaintext passwords (http://stackoverflow.com/questions/2283937/how-should-i-ethically-approach-user-password-storage-for-later-plaintext-retriev) and I figured this would be right up your alley. Can you offer any further suggestions on password metrics? I am trying to cobble together a way to monitor overall system security based on passwords used within the system. I figured entropy would be a good place to start but am open to other suggested metrics as well.
angryCodeMonkey
@Shane First of all these rules are going to piss people off, but they will be safer for it. Rule #2 in CWE-521 is best enforced using a regex, and this will prevent the most commonly used passwords, as well as all dictionary words and there for is the best rule that could be enforced. I don't see how enforcing a max size helps, but the max could be a few kb in size (why not?). To be honest your question is a bit strange, entropy is about potential and by enforcing a mix character set you are increasing that potential.
Rook
@Shane On a side note CWE-257 is extremely important and I have no idea why you are ignoring it. If someone doesn't know their password then there is no point in telling it them. If you need to update the message digest used you can do so on the next login. There is **absolutely nothing** to gain from a user's perspective and it makes you a juicy target from the attacker's perspective.
Rook
@Rook - I'll admit I am a bit confused, your first comment makes sense and I like the idea of regexing the password to restrict dictionary / repetitious characters. However, your second comment doesn't seem to apply to this question--how am I ignoring CWE-257, and I am not talking about giving users their password (not in this post, and in the other (referenced) post I choose a best answer that also did not). Anyhow, I am using the bits of entropy as a metric that I can use to measure the strength of my user's choosen passwords--it benefits me to see an average of user password strength.
angryCodeMonkey
Basically every user row in the database will hold a password 'score' and I can use this overall score (somewhat based on entropy) to gauge the strength of passwords that my users are choosing. This is an existing user set and I can't just impose a highly secure password requirement on them without some justification, so this score will help me to show how (potentially) weak our system is due to user choosen passwords within current (weak) requirements.
angryCodeMonkey
@Shane, okay, I'll connect the pieces in a real world attack scenario. If you have a password strength score stored in the database, then an attacker with a SQL Injection vulnerability will have this score. He can select out the weakest passwords in the entire system and then attack them first. This system should be a barrier to creating a new password thus making all passwords stronger, and not stored because it will be used against you.
Rook
@Shane In terms of the cwe-257 violation, if the attacker knows one of the passwords, then he can brute force your secret key. Once this single key is brute forced, then he goes and decrypts **ALL** passwords on the system in one foul swoop. This is why message digests are used, each password will have to be brute forced separately.
Rook
@Rook - OK I see what you are saying about storing their password score in the database. However, in this specific instance, if they get into my database far enough to see this score then they don't need a password--they already have 100% of my secure data. However, on other projects I build I will consider your advise if other important items like uploaded documents and the like (not stored directly in the DB) are at risk. Thanks again.
angryCodeMonkey
@Shane "Defense in depth" is about planning on failure. SQL Injection is a very common vulnerability and its should be the biggest threat you are considering when building a secure application. Password security is meaningless when your application can be abused with a serious vulnerability.
Rook