views:

458

answers:

2

This question has to do with PHP's implementation of crypt(). For this question, the first 7 characters of the salt are not counted, so a salt '$2a$07$a' would be said to have a length of 1, as it is only 1 character of salt and seven characters of meta-data.

When using salt strings longer than 22 characters, there is no change in the hash generated (i.e., truncation), and when using strings shorter than 21 characters the salt will automatically be padded (with '$' characters, apparently); this is fairly straightforward. However, if given a salt 20 characters and a salt 21 characters, where the two are identical except for the final character of the 21-length salt, both hashed strings will be identical. A salt 22 characters long, which is identical to the 21 length salt except for the final character, the hash will be different again.

Example In Code:

$foo = 'bar';
$salt_xx = '$2a$07$';
$salt_19 = $salt_xx . 'b1b2ee48991281a439d';
$salt_20 = $salt_19 . 'a';
$salt_21 = $salt_20 . '2';
$salt_22 = $salt_21 . 'b';

var_dump(
    crypt($foo, $salt_19), 
    crypt($foo, $salt_20), 
    crypt($foo, $salt_21), 
    crypt($foo, $salt_22)
);

Will produce:

string(60) "$2a$07$b1b2ee48991281a439d$$.dEUdhUoQXVqUieLTCp0cFVolhFcbuNi"
string(60) "$2a$07$b1b2ee48991281a439da$.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2O4AH0.y/AsOuzMpI.f4sBs8E2hQjPUQq"

Why is this?

EDIT:

Some users are noting that there is a difference in the overall string, which is true. In salt_20, offset (28, 4) is da$., while in salt_21, offset (28, 4) is da2.; however, it is important to note that the string generated includes the hash, the salt, as well as instructions to generate the salt (i.e. $2a$07$); the part in which the difference occurs is, in fact, still the salt. The actual hash is unchanged as UxGYN739wLkV5PGoR1XA4EvNVPjwylG.

Thus, this is in fact not a difference in the hash produced, but a difference in the salt used to store the hash, which is precisely the problem at hand: two salts are generating the same hash.

Rembmer: the output will be in the following format:

"$2a$##$saltsaltsaltsaltsaltsaHASHhashHASHhashHASHhashHASHhash"
//                            ^ Hash Starts Here, offset 28,32

where ## is the log-base-2 determining the number of iterations the algorithm runs for

Edit 2:

In the comments, it was requested that I post some additional info, as the user could not reproduce my output. Execution of the following code:

var_dump(
    PHP_VERSION, 
    PHP_OS, 
    CRYPT_SALT_LENGTH, 
    CRYPT_STD_DES, 
    CRYPT_EXT_DES, 
    CRYPT_MD5, 
    CRYPT_BLOWFISH
);

Produces the following output:

string(5) "5.3.0"
string(5) "WINNT"
int(60)
int(1)
int(1)
int(1)
int(1)

Hope this helps.

+3  A: 

Looks like the outputs are actually different. (da$, vs da2) for result of salt_20 and salt_21.

Harley Green
Actually, that part is actually still the salt of the hash, not the hash itself. The hash is only the last 32 characters. I have edited the question to reflect the fact that the salt is prepended to the string in order to make the question more clear.
Dereleased
Maybe you can post some information about your php version, host OS, and the values of the following php constants: - CRYPT_SALT_LENGTH - CRYPT_STD_DES - CRYPT_EXT_DES - CRYPT_MD5 - CRYPT_BLOWFISHWhen I run the code you posted I get the following:`string(13) "$2yymq4BO0Z3w" string(13) "$2yymq4BO0Z3w" string(13) "$2yymq4BO0Z3w" string(13) "$2yymq4BO0Z3w"`
Harley Green
You are getting encryption using STD_DES - notice how only the first two characters of the given salt are considered. Internal support for Blowfish (if not supported by your host OS) was not added until PHP 5.3, so if you have an older version it would revert to the default method, which on your setup appears to be STD_DES. See http://php.net/manual/en/function.crypt.php for more info.
Dereleased
I would venture then that the crypt function has some internal mechanism for manipulating the salt once it is received, either some expansion or contraction function to make it the right size for the given encryption algorithm (perhaps treating it as base64 and when decoding if it does not have the correct number of bytes to read in it pads it regardless of the last several bytes). I've not taken the time to dig through the latest php source code to look at this, but if you are really curious that's where you should start!
Harley Green
Problem solved!
Dereleased
+3  A: 

After some experimentation, I have come to the conclusion that this is due to the way the salt is treated. The salt is not considered to be literal text, but rather to be a base64 encoded string, such that 22 bytes of salt data would actually represent a 16 byte string (floor(22 * 24 / 32) == 16) of salt. The "Gotcha!" with this implementation, though, is that, like Unix crypt, it uses a "non-standard" base64 alphabet. To be exact, it uses this alphabet:

./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz$

The 65th character, '$', is the padding character.

Now, the crypt() function appears to be capable of taking a salt of any length less than or equal to its maximum, and silently handling any inconsistencies in the base64 by discarding any data that doesn't make up another full byte. The crypt function will fail completely if you pass it characters in the salt that are not part of its base64 alphabet, which just confirms this theory of its operation.

Take an imaginary salt '1234'. This is perfectly base64 consistent in that it represents 24 bits of data, so 3 bytes, and does not carry any data that needs to be discarded. This is a salt whose Len Mod 4 is zero. Append any character to that salt, and it becomes a 5 character salt, and Len Mod 4 is now 1. However, this additional character represents only six bits of data, and therefore cannot be transformed into another full byte, so it is discarded.

Thus, for any two salts A and B, where

   Len A Mod 4 == 0 
&& Len B Mod 4 == 1  // these two lines mean the same thing
&& Len B = Len A + 1 // but are semantically important separately
&& A == substr B, 0, Len A

The actual salt used by crypt() to calculate the hash will, in fact, be identical. As proof, I'm including some example PHP code that can be used to show this. The salt constantly rotates in a seminon-random way (based on a randomish segment of the whirlpool hash of the current time to the microsecond), and the data to be hashed (herein called $seed) is simply the current Unix-Epoch time.

$salt = substr(hash('whirlpool',microtime()),rand(0,105),22);
$seed = time();
for ($i = 0, $j = strlen($salt); $i <= $j; ++$i) {
    printf('%02d = %s%s%c',
        $i,
        crypt($seed,'$2a$07$' . substr($salt, 0, $i)),
        $i%4 == 0 || $i % 4 == 1 ? ' <-' : '',
        0x0A
    );
}

And this produces output similar to the following

00 = $2a$07$$$$$$$$$$$$$$$$$$$$$$.rBxL4x0LvuUp8rhGfnEKSOevBKB5V2. <-
01 = $2a$07$e$$$$$$$$$$$$$$$$$$$$.rBxL4x0LvuUp8rhGfnEKSOevBKB5V2. <-
02 = $2a$07$e8$$$$$$$$$$$$$$$$$$$.WEimjvvOvQ.lGh/V6HFkts7Rq5rpXZG
03 = $2a$07$e89$$$$$$$$$$$$$$$$$$.Ww5p352lsfQCWarRIWWGGbKa074K4/.
04 = $2a$07$e895$$$$$$$$$$$$$$$$$.ZGSPawtL.pOeNI74nhhnHowYrJBrLuW <-
05 = $2a$07$e8955$$$$$$$$$$$$$$$$.ZGSPawtL.pOeNI74nhhnHowYrJBrLuW <-
06 = $2a$07$e8955b$$$$$$$$$$$$$$$.2UumGVfyc4SgAZBs5P6IKlUYma7sxqa
07 = $2a$07$e8955be$$$$$$$$$$$$$$.gb6deOAckxHP/WIZOGPZ6/P3oUSQkPm
08 = $2a$07$e8955be6$$$$$$$$$$$$$.5gox0YOqQMfF6FBU9weAz5RmcIKZoki <-
09 = $2a$07$e8955be61$$$$$$$$$$$$.5gox0YOqQMfF6FBU9weAz5RmcIKZoki <-
10 = $2a$07$e8955be616$$$$$$$$$$$.hWHhdkS9Z3m7/PMKn1Ko7Qf2S7H4ttK
11 = $2a$07$e8955be6162$$$$$$$$$$.meHPOa25CYG2G8JrbC8dPQuWf9yw0Iy
12 = $2a$07$e8955be61624$$$$$$$$$.vcp/UGtAwLJWvtKTndM7w1/30NuYdYa <-
13 = $2a$07$e8955be616246$$$$$$$$.vcp/UGtAwLJWvtKTndM7w1/30NuYdYa <-
14 = $2a$07$e8955be6162468$$$$$$$.OTzcPMwrtXxx6YHKtaX0mypWvqJK5Ye
15 = $2a$07$e8955be6162468d$$$$$$.pDcOFp68WnHqU8tZJxuf2V0nqUqwc0W
16 = $2a$07$e8955be6162468de$$$$$.YDv5tkOeXkOECJmjl1R8zXVRMlU0rJi <-
17 = $2a$07$e8955be6162468deb$$$$.YDv5tkOeXkOECJmjl1R8zXVRMlU0rJi <-
18 = $2a$07$e8955be6162468deb0$$$.aNZIHogUlCn8H7W3naR50pzEsQgnakq
19 = $2a$07$e8955be6162468deb0d$$.ytfAwRL.czZr/K3hGPmbgJlheoZUyL2
20 = $2a$07$e8955be6162468deb0da$.0xhS8VgxJOn4skeI02VNI6jI6324EPe <-
21 = $2a$07$e8955be6162468deb0da3.0xhS8VgxJOn4skeI02VNI6jI6324EPe <-
22 = $2a$07$e8955be6162468deb0da3ucYVpET7X/5YddEeJxVqqUIxs3COrdym

The conclusion? Twofold. First, it's working as intended, and second, know your own salt or don't roll your own salt.

Dereleased
Base64 gotchas... My guess was correct.
Harley Green
Indeed. Might have been more apparent if it used a standard base64 alphabet, but oh well, life goes on.
Dereleased