tags:

views:

684

answers:

5

I want to send function names from a weak embedded system to the host computer for debugging purpose. Since the two are connected by RS232, which is short on bandwidth, I don't want to send the function's name literally. There are some 15 chars long function names, and I sometimes want to send those names at a pretty high rate.

The solution I thought about, was to find a hash function which would hash those function names to a single byte, and send this byte only. The host computer would scan all the functions in the source, compute their hash using the same function, and then would translate the hash to the original string.

The hash function must be

  1. Collision free for short strings.
  2. Simple (since I don't want too much code in my embedded system).
  3. Fit a single byte

Obviously, it does not need to be secure by any means, only collision free. So I don't think using cryptography-related hash function is worth their complexity.

An example code:

int myfunc() {
    sendToHost(hash("myfunc"));
}

The host would then be able to present me with list of times where the myfunc function was executed.

Is there some known hash function which holds the above conditions?

Edit:

  1. I assume I will use much less than 256 function-names.
  2. I can use more than a single byte, two bytes would have me pretty covered.
  3. I prefer to use a hash function instead of using the same function-to-byte map on the client and the server, because (1) I have no map implementation on the client, and I'm not sure I want to put one for debugging purposes. (2) It requires another tool in my build chain to inject the function-name-table into my embedded system code. Hash is better in this regard, even if that means I'll have a collision once in many while.
+2  A: 

Hmm with only 256 possible values, since you will parse your source code to know all possible functions, maybe the best way to do it would be to attribute a number to each of your function ???

A real hash function would probably won't work because you have only 256 possible hashes. but you want to map at least 26^15 possible values (assuming letter-only, case-insensitive function names). Even if you restricted the number of possible strings (by applying some mandatory formatting) you would be hard pressed to get both meaningful names and a valid hash function.

Ksempac
+2  A: 

No, there isn't.

You can't make a collision free hash code, or even close to it, with just an eight bit hash. If you allow strings that are longer than one character, you have more possible strings than there are possible hash codes.

Why not just extract the function names and give each function name an id? Then you only need a lookup table on each side of the wire.

(As others have shown you can generate a hash algorithm without collisions if you already have all the function names, but then it's easier to just assign a number to each name to make a lookup table...)

Guffa
Why the downvotes? If you don't say what it is that you don't like, it's really pointless.
Guffa
+7  A: 

Try minimal perfect hashing:

Minimal perfect hashing guarantees that n keys will map to 0..n-1 with no collisions at all.

C code is included.

Martin B
also see gperf, http://www.gnu.org/software/gperf/
Hasturkun
That doesn't work without first getting all the function names.
Guffa
Yes, you can only do perfect hashing if you know all of the strings in advance. If that's not the case, one approach is to use a hash table to handle the collisions, then transmit the index of the entry in the hash table.
Martin B
you might also be able to coax gperf or similar to inline at compile time, reducing the computation cost to 0
Hasturkun
+1  A: 
DrJokepu
+2  A: 

If you have a way to track the functions within your code (i.e. a text file generated at run-time) you can just use the memory locations of each function. Not exactly a byte, but smaller than the entire name and guaranteed to be unique. This has the added benefit of low overhead. All you would need to 'decode' the address is the text file that maps addresses to actual names; this could be sent to the remote location or, as I mentioned, stored on the local machine.

ezpz
This is how I'd do it. You should be able to use the debug information in the compiled binary to extract the function name, without needing an additional table.
Brooks Moses