tags:

views:

177

answers:

4

I'd like to know which algorithm is employed. I strongly assume it's something simple and hopefully common. There's no lag in generating the results, for instance.

Input: any string
Output: 5 hex characters (0-F)

I have access to as many keys and results as I wish, but I don't know how exactly I could harness this to attack the function. Is there any method? If I knew any functions that converted to 5-chars to start with then I might be able to brute force for a salt or something.

I know for example that:
a=06a07
b=bfbb5
c=63447
(in case you have something in mind)

In normal use it converts random 32-char strings into 5-char strings.

+2  A: 

This sounds mildly illicit.

Not to rain on your parade or anything, but if the implementors have done their work right, you wouldn't notice lags beyond a few tens of milliseconds on modern CPU's even with strong cryptographic hashes, and knowing the algorithm won't help you if they have used salt correctly. If you don't have access to the code or binaries, your only hope is a trivial mistake, whether caused by technical limitations or carelesseness.

There is an uncountable infinity of potential (hash) functions for any given set of inputs and outputs, and if you have no clue better than an upper bound on their computational complexity (from the lag you detect), you have a very long search ahead of you...

Pontus Gagge
I doubt that it is illicit - 20 bits is not a big enough output for this to be part of any sensible security system.
Steve Jessop
Good point about an infinity of potential functions. The truth here is that I know the implementers didn't do it right anywhere else, so I don't think their hash function will be too complex or obscure. I already know there's no salt for instance.
graw
"Sensible" and "security systems" has a nonempty but rather small intersection, I'm afraid! Especially homegrown.
Pontus Gagge
Breaking non-sensible security systems isn't illicit either, it's "research" ;-). Thinking about it, there are uses for hash functions where it doesn't matter that collisions are easily generated, and so a small hash could be OK. Challenge-response protocols, for instance, where the "secret" is some value which is hashed together with the challenge to give the response. But in such systems, it doesn't matter if the hash algorithm itself is known, provided that the secret can't be deduced.
Steve Jessop
Re 'research': Good luck with the DMCA if you're within US jurisdiction... Concur otherwise.
Pontus Gagge
All the major DRM systems have been analysed already from the algorithm POV, so no need for this. Just look it up :-) But you're right, I'm being disingenuous about "research", the law might take a different view.
Steve Jessop
I think "uncountable infinity" is going a little overboard, as there's only a countable infinity of *programs*, or equivalently *computable* functions. And of course only finitely many could fit in memory. :)
Jason Orendorff
For any two finite sets of inputs and outputs there is a finite set of possible hash functions, exactly |input|*|output| functions.
Ants Aasma
@Ants: In this case the input set is not finite (unless you constrain it with a resource limit on the length of the string, but that's usually a difference between an algorithm and its implementation).
Steve Jessop
graw
I don't know much about cryptography, and I know there are lots of brute forcing attacks made given a function and a hash to try and calculate the key. I thought maybe there would be some method of brute forcing an algorithm based on a set of input and output. Imagine, say, you have a set of primitive operations which are applied in billions of combinations of limited complexity to the input. I have no idea if something like this is at all feasible for a weak hash function, but from the comments here I get the feel maybe it's not. I'd be curious if any research like this, hard to google.
graw
+4  A: 

The only way to derive a hash function from data is through brute force, perhaps combined with some cleverness. There are an infinite number of hash functions, and the good ones perform what is essentially one-way encryption, so it's a question of trial and error.

It's practically irrelevant that your function converts 32-character strings into 5-character hashes; the output is probably truncated. For fun, here are some perfectly legitimate examples, the last 3 of which are cryptographically terrible:

  • Use the MD5 hashing algorithm, which generates a 16-character hash, and use the 10th through the 14th characters.
  • Use the SHA-1 algorithm and take the last 5 characters.
  • If the input string is alphabetic, use the simple substitution A=1, B=2, C=3, ... and take the first 5 digits.
  • Find each character on your keyboard, measure its distance from the left edge in millimeters, and use every other digit, in reverse order, starting with the last one.
  • Create a stackoverflow user whose name is the 32-bit string, divide 113 by the corresponding user ID number, and take the first 5 digits after the decimal. (But don't tell 'em I told you to do it!)
Adam Liss
Funny!OK I guess you are right. I already tried MD5, SHA-1, CRC32 and didn't see any relations. Out of curiosity do you know any public tools that calculate a whole bunch of different common hashes for something like this?
graw
Sure: For starters you can look here: http://en.wikipedia.org/wiki/Cryptographic_hash_function#Cryptographic_hash_algorithms but there's a good chance yours may be a home-grown algorithms, in which case it's probably a simple combination of XOR and scrambling bits.
Adam Liss
Was looking for something more like this http://www.fileformat.info/tool/hash.htm?text=test :P
graw
@graw: nice! Thanks for posting the link.
Adam Liss
+4  A: 

Depending on what you need this for, if you have access to as many keys and results as you wish, you might want to try a rainbow table approach. 5 hex chars is only 1mln combinations. You should be able to brute-force generate a map of strings that match all of the resulting hashes in no time. Then you don't need to know the original string, just an equivalent string that generates the same hash, or brute-force entry by iterating over the 1mln input strings.

SF.
Interesting solution... yes it would be possible. Unfortunately since it's for a web app it wouldn't be practical in this case (I hoped to implement something in js :P).
graw
+3  A: 

Following on from a comment I just made to Pontus Gagge, suppose the hash algorithm is as follows:

  • Append some long, constant string to the input
  • Compute the SHA-256 hash of the result
  • Output the last 5 chars of the hash.

Then I'm pretty sure there's no computationally feasible way from your chosen-plaintext attack to figure out what the hashing function is. To even prove that SHA-256 is in use (assuming it's a good hash function, which as far as we currently know it is), I think you'd need to know the long string, which is only stored inside the "black box".

That said, if I knew any published 20-bit hash functions, then I'd be checking those first. But I don't know any: all the usual non-crypto string hashing functions are 32 bit, because that's the expected size of an integer type. You should perhaps compare your results to those of CRC, PJW, and BUZ hash on the same strings, as well as some variants of DJB hash with different primes, and any string hash functions built in to well-known programming languages, like java.lang.String.hashCode. It could be that the 5 output chars are selected from the 8 hex chars generated by one of those.

Beyond that (and any other well-known string hashes you can find), I'm out of ideas. To cryptanalyse a black box hash, you start by looking for correlations between the bits of the input and the bits of the output. This gives you clues what functions might be involved in the hash. But that's a huge subject and not one I'm familiar with.

Steve Jessop