ansaurus

Question

firefox cache hash key generation algorithm bug

Answer 1

+5 A:

From what I understand of just reading the bugzilla entry, the bug manifests when two distinct problems occur:

Their hash algorithm generates collisions for urls that are "similar enough". From the bug "similiar enough" seems to mean every 4 characters (or perhaps 8) the urls are the same, and
Their logic for dealing with hash collisions fails because they haven't flushed the previous url with the same hash value to disk yet.

So basically, if you have a page with two very similar urls this might happen on some versions of Firefox. It generally won't happen on different pages, I would expect, since then FF will have time to flush the entries to disk avoiding the timing issue.

So if you have multiple resources (scripts, images, etc) that are all loaded from the same page, make sure they have a run of 9 characters that are completely different. One way you might ensure this is by appending a querystring (that you ignore) with a random bit of data, something like:

http://foo.com/resource.js?r=dn#@JdsK#

jeffamaphone 2009-03-20 01:58:22

Yeah, I read bytes where it should have been bits and mentally converted that to characters. Others below have good explanations of the hashing algorithm.

jeffamaphone 2009-03-20 03:51:58

Suggestion of a query string is good, but would like to ensure unique urls for my files as a pre-process.

jedierikb 2009-03-31 12:46:28

Also, adding a random querystring at runtime requires caching that random querystring somewhere versus developing a pattern which doesn't collide.

jedierikb 2009-04-01 10:49:54

Answer 2

+1 A:

First, you cannot hash uniquely all strings to integers (obviously, there are more strings than (fixed size) integers, so there have to be collisions). You can have a hashtable that can hold all sets of data (eg. all your files), but to get it, you need to change the code of the hashtable, not the hashing function.

Second, I see a problem with the hashing function you posted, in this part:

PR_ROTATE_LEFT32(h, 4)

If it really does rotation of h (I haven't checked on this), rotating by 4 means that strings having two 8-byte (I assume 32-bit hash) parts swapped (eg. xxxxxxxxyyyyyyyy vs. yyyyyyyyxxxxxxxx) will have equal hash. If you change it to something relatively prime to the hash size (eg. 5), this will only happen for swapped parts of length 32.

jpalecek 2009-03-20 02:02:03

I think the question he is asking is 'how can i work around this poor hash function', not 'how can i build a better hash function'

FryGuy 2009-03-20 02:06:37

Answer 3

+3 A:

Here is how the algorithm works:

initialize hash to 0
for each byte
    shift hash 4 bits to left (with rotate)
    hash = hash XOR character

visually (16-bit version):

00110000             = '0'
    00110001         = '1'
        00110010     = '2'
            00110011 = '3'
0100            0011 = '4'
00110101             = '5'
====================
01000110001000010000  (and then this will be 'rotated'
                       so that it lines up with the end)
giving:
        00100001000001000110

What this means is that if you have strings of the same length and are mostly the same, then in at least one case, the lower 4 bits of a char and upper 4 bits of the next char xor each other must be unique. However, the method of sticking the 32 bit number into a table might be ever weaker, meaning that it requires the lower4 xor upper4 of a particular location in the string (mod 8 chars) be unique.

FryGuy 2009-03-20 02:20:57

Answer 4

A:

You're apparently mistaken about the real bug. Sure, there are hash collisions due to the incredeibly bad choice of a hash algorithm. But even hash(x)=1 wouldn't cause the problems described. It would merely turn an O(1) lookup into an O(N) linked list search through the first bucket.

The real problem is that Firefox fails to deal with hash collisions. It therefore requires a perfect hash of all URLs. "All URLs" unfortunately is a set outside of your control.

MSalters 2009-04-01 10:16:57

I can at least ensure that my site's subset of "all urls" don't collide with a pre-processing utility for my site.

jedierikb 2009-04-01 10:48:11

Answer 5

+2 A:

This bug was a major issue for my site: http://worldofsolitaire.com

I worked around it a long time ago by using a conditional rule in an .htaccess file that would disable ALL caching of images on the site for Firefox users. This was a horrible thing to need to do, but at the time I couldn't track down the bug within Firefox and having the site be slightly slower is better than showing duplicate/corrupted images.

When I read in the linked bug that it was fixed in the latest Firefox releases, I changed the conditional on April 19th 2009 (yesterday) to only disable caching for Firefox 2 users.

A few hours later I've received over 10 e-mails from Firefox 3 users (confirmed) that they were seeing duplicate images. So this issue is STILL a problem in Firefox 3.

I decided to create a simple Linux test program that would allow me to check URL's to see if they are generating the same cache hash keys.

To compile in any Linux system: g++ -o ffgenhash ffgenhash.cpp

Here is the code (save to file ffgenhash.cpp)

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define ULONG_MAX 0xFFFFFFFF
#define PR_ROTATE_LEFT32(a, bits) (((a) << (bits)) | ((a) >> (32 - (bits))))

unsigned long ffgenhash(const char * key)
{
    unsigned long h=0;

    for(const unsigned char * s = (unsigned char *) key; *s != '\0'; ++s)
    {
        h = PR_ROTATE_LEFT32(h, 4) ^ *s;
    }

    return (h==0 ? ULONG_MAX : h);
}

int main(int argc, char ** argv)
{
    printf("%d\n", ffgenhash(argv[1]));
    return 0;
}

As you can see, here are two real life URL's that generate the same cache hash key:

./ffgenhash "http://worldofsolitaire.com/decks/paris/5/12c.png"
1087949033
./ffgenhash "http://worldofsolitaire.com/decks/paris/5/13s.png"
1087949033

Since I pre-load these images in a Javascript loop, trying to use some sort of empty <script> tag workaround is not possible here.

Indeed I think my only real solution is to modify the URL's for Firefox users in some way to generate a unique cache hash key. So that's the approach I'll use.

By the way, I'm half tempted to create a Firebug addition that will check all resources loaded by a site and give a big error if two resources on the site share a common hash key so the developer is aware. It would be great to run sites like Google maps through this as I've seen weird things with those images over the past few years :)

Sembiance 2009-04-20 13:50:30

Answer 6

+1 A:

This is modified version of Sembiance's hash generator which works correctly even when compiled on 64-bit platform:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define ULONG_MAX 0xFFFFFFFF
#define PR_ROTATE_LEFT32(a, bits) (((a) << (bits)) | ((a) >> (32 - (bits))))

unsigned int ffgenhash(const char * key) {
    unsigned int h=0;
    for(const unsigned char * s = (unsigned char *) key; *s != '\0'; ++s) {
        h = PR_ROTATE_LEFT32(h, 4) ^ *s;
    }
    return (h==0 ? ULONG_MAX : h);
}

int main(int argc, char ** argv) {
    printf("%u\n", ffgenhash(argv[1]));
    return 0;
}

Darwin 2010-03-10 00:35:02

ansaurus

tags:

views:

answers:

firefox cache hash key generation algorithm bug

related questions