views:

254

answers:

9

Sorry if the title doesn't make sense. Basically I have a series of strings that are 10-60 characters long. The problem being is the service I have to use only accepts strings up to 25 so I need a way to convert the strings I have to 25 characters or less, send it off and when I get the results back be able to convert it back to the original id.

Example:

id =  "this_is_a_test_account_that_is_longer_than_allowed"
id = contract(id)
// id = "DSFK23478JDSFHGW874"
id = expand("DSFK23478JDSFHGW874")
// id = "this_is_a_test_account_that_is_longer_than_allowed"
+12  A: 

No, you can't do this. It's basically asking for a compression algorithm which will always make things smaller - it's just not going to happen. At least not in a general sense, due to the pigeonhole principle. (In particular, think about every hex string of the right length. You've got to store all of those, so suppose each one just goes to itself. Now, you've got to be able to store other strings too - but by definition you've run out of valid outputs.)

On the other hand, if you have a server which could generate a UUID for any string and store the string, you could then look that UUID up again later. Would that work for your situation? (Of course it doesn't have to be a UUID - you could just start with 0 and work your way up...)

If you know all the strings beforehand, that's just a special-case of this situation: create a hard-coded bi-directional map for all the strings, generating the output uniquely in some fashion (e.g. with UUIDs).

Jon Skeet
+1  A: 

If the characters of the string are restricted to only a few, it is possible to do some compression using the "forbidden characters" to compress it. But I believe it isn't as good as compressing 60 chars into 25 ones...

Samuel Carrijo
To represent a hex digit we need 4 bits. Assuming the input character set is [_a-z] - i.e 27 different characters - we need >4 bits [log(27) / log(2)] to represent the whole character set. Therefore even in the case of this restricted character set we can only hope to map a maximum of <22 characters [(4*25)/(log(27)/log(2))] into a 25 digit hex number.
teabot
A: 

Sorry, you are probably going to have to do something fancy or change the service. It could be as simple as storing off your arbitrarily large strings to a simple table where you use an identity field which is what you send to the service to pull the full string back.

Andy_Vulhop
A: 

Really, an add on to Jon's answer. In the general case (any 10-60 character string) this is not possible.

HOWEVER, if your orginal IDs have well known characteristics - i.e. you only use characters 0 thru 9 - then it would be possible. But we don't have enough information to help you.

donovan Jimenez
A: 

You can't do it in the general case - always make strings smaller would require impossible compression. However, I can see two options:

Firstly, just store the key in a map:

shared state: map
contract(id) {
  key = generateNewUniqueShortString();
  map.put(key, id);
  return key;
}
expand(key) {
  return map.remove(key);
}

This requires some shared storage but might work fine.

If you know something about your strings (like for example you only ever use A-Za-z0-9_ then you could use a lookup table to reduce the size. This would mean each character only needs 6 bits, whereas in Java you have 16 bits per character. Using some sort of Huffman encoding based on frequency would work even better, but wouldn't be guaranteed.

Nick Fortescue
A: 

It looks like your input character set is lower case letters plus underscore (27 characters). If you had only 16 characters in your source input, you could put two into a byte.

If you're contracting to a two-byte character format, you can easily do this. If you're going to a one-byte character format, I think you can't.

How about breaking your strings up into three smaller strings and using the service three times?

Nosredna
A: 

This would depend greatly on the content of those strings. For example, if you knew that the input strings were always composed of only the letters a-z (26), A-Z (26) and numbers 0-9 (10), then you could be assured that each character of the long string is one of 62 possible things, which could easily be stored with fewer bits (six, in this case). Assuming that the service you are using uses eight bits for a character, that gets you a 25% reduction in length. If the input strings use less characters, or the service accepts more than eight bits per character, you may be able to improve things enough to get by.

e.James
The title of the question suggests that only [0-9A-F] are valid in the destination character set.
teabot
The title does, but the example code shows other characters in the compressed string.
e.James
A: 

If the converted string is just for transient use, so let's say send a request and get a response back, then you can use some function to get a "transient unique" max 25 char string and store a mapping to your original id. After using the transient id, you could discard it. For each request you could create new ones, as needed. You just have to make sure you don't get duplicate mappings in the scope you are using those ids. (Similar to Nick Fortescues first example.)

MicSim
A: 

Compression will not get you from were you are to where you need to be. There are three approaches that I think may solve your problem...depending on the details of the account and the service.

1) Assign an alternate ID to the account that will fit into the 25 character limit. Treat the existing ID as a 'description' rather than the key for the service. This requires that you can generate some kind of hash and reliably store that outside the service, or that the service will also store a 'description' that is between 10 and 60 characters.

2) Break the ID into three pieces and store each at a separate 20 character ID using the service. Use the remaining 5 characters to assign some kind of unique signature to each part...allowing you to retrieve all three pieces and reassemble the ID. Depending on the service, this may be undesirable (e.g. it may create three full records for a single instance).

3) Alter the service or find a new service that will allow ID's up to 60 characters.

semiuseless