views:

185

answers:

6

Is there a common formula that could provide a unique value for a 512 character file path, assuming one 32 bit byte per character, and possibly limiting the characters used in the names?

I know that if you just used uppercase letters alone the combination would be grossly more than a 32-bit int, but what about using an identity field with some library so that the numbers once assigned are always the same given that file path? That's an idea, but I am looking for some standard formula out there, or at least that someone had success implementing it in the real world.

BTW, I am using C#, but any language would work as an example, or link to a website.

Thanks

+3  A: 

Would a hash code of the file path be unique enough?

MSDN: Object.GetHashCode()

Jon Seigel
+1  A: 

Even if you use 1 bit per character, the result will be 512 bits which is more than 32 which means that in some cases two different paths will generate the same value. If your dataset is "all 512 char path" all you can really do is to look for a hash function with low probability of conflicts on a particular subset of path.

AlexEzh
+2  A: 

You can use String.GetHashCode(). It will, of course, not be unique, but two equal strings will have the same hash value.

Check this link for an example of finding hash collisions when using GetHashCode().

Groo
The collision resolution is the winner! Get a reasonably unique id, and then make sure they don't collide. Nice answer.
Dr. Zim
A: 

If you can represent a 512 byte string with a 32 bits number, all I can say is, What a nice compression method!

Rodrigo
A: 

I know you said int, but if you can take string, you can use md5 and will get a unique value per path. Beyond that, the only thing I can think of is to assign an arbitrary number for each one by incrementation. That won't get you a real hash though, just make a path id...

Eli
A: 

So, in other words, you're looking for a .NET CRC32 implementation that returns its result as a UInt32 rather than a 8-character string?

Unfortunately, all the ones I've seen return a byte array, including this one.

R. Bemrose