Yes, it is possible to encode an extra bit of information while maintaining the previous encoding for 3 character values. But since your original encoding doesn't leave nice clean swaths of free numbers in the output set, mapping of the additional set of Strings introduced by adding that extra character cannot help but be a little discontinuous.
Accordingly, I think it would be hard to come up with mapping functions that handle these discontinuities without being both awkward and slow. I conclude that a table-based mapping is the only sane solution.
I was too lazy to re-engineer your mapping code, so I incorporated it into the table initialization code of mine; this also eliminates many opportunities for translation errors :) Your encode()
method is what I call OldEncoder.encode()
.
I've run a small test program to verify that NewEncoder.encode()
comes up with the same values as OldEncoder.encode()
, and is in addition able to encode Strings with a leading 4th character. NewEncoder.encode()
doesn't care what the character is, it goes by String length; for decode()
, the character used can be defined using PREFIX_CHAR
. I've also eyeball checked that the byte array values for prefixed Strings don't duplicate any of those for non-prefixed Strings; and finally, that encoded prefixed Strings can indeed be converted back to the same prefixed Strings.
package tequilaguy;
public class NewConverter {
private static final String[] b2s = new String[0x10000];
private static final int[] s2b = new int[0x10000];
static {
createb2s();
creates2b();
}
/**
* Create the "byte to string" conversion table.
*/
private static void createb2s() {
// Fill 17576 elements of the array with b -> s equivalents.
// index is the combined byte value of the old encode fn;
// value is the String (3 chars).
for (char a='A'; a<='Z'; a++) {
for (char b='A'; b<='Z'; b++) {
for (char c='A'; c<='Z'; c++) {
String str = new String(new char[] { a, b, c});
byte[] enc = OldConverter.encode(str);
int index = ((enc[0] & 0xFF) << 8) | (enc[1] & 0xFF);
b2s[index] = str;
// int value = 676 * a + 26 * b + c - ((676 + 26 + 1) * 'A'); // 45695;
// System.out.format("%s : %02X%02X = %04x / %04x %n", str, enc[0], enc[1], index, value);
}
}
}
// Fill 17576 elements of the array with b -> @s equivalents.
// index is the next free (= not null) array index;
// value = the String (@ + 3 chars)
int freep = 0;
for (char a='A'; a<='Z'; a++) {
for (char b='A'; b<='Z'; b++) {
for (char c='A'; c<='Z'; c++) {
String str = "@" + new String(new char[] { a, b, c});
while (b2s[freep] != null) freep++;
b2s[freep] = str;
// int value = 676 * a + 26 * b + c - ((676 + 26 + 1) * 'A') + (26 * 26 * 26);
// System.out.format("%s : %02X%02X = %04x / %04x %n", str, 0, 0, freep, value);
}
}
}
}
/**
* Create the "string to byte" conversion table.
* Done by inverting the "byte to string" table.
*/
private static void creates2b() {
for (int b=0; b<0x10000; b++) {
String s = b2s[b];
if (s != null) {
int sval;
if (s.length() == 3) {
sval = 676 * s.charAt(0) + 26 * s.charAt(1) + s.charAt(2) - ((676 + 26 + 1) * 'A');
} else {
sval = 676 * s.charAt(1) + 26 * s.charAt(2) + s.charAt(3) - ((676 + 26 + 1) * 'A') + (26 * 26 * 26);
}
s2b[sval] = b;
}
}
}
public static byte[] encode(String str) {
int sval;
if (str.length() == 3) {
sval = 676 * str.charAt(0) + 26 * str.charAt(1) + str.charAt(2) - ((676 + 26 + 1) * 'A');
} else {
sval = 676 * str.charAt(1) + 26 * str.charAt(2) + str.charAt(3) - ((676 + 26 + 1) * 'A') + (26 * 26 * 26);
}
int bval = s2b[sval];
return new byte[] { (byte) (bval >> 8), (byte) (bval & 0xFF) };
}
public static String decode(byte[] b) {
int bval = ((b[0] & 0xFF) << 8) | (b[1] & 0xFF);
return b2s[bval];
}
}
I've left a few intricate constant expressions in the code, especially the powers-of-26 stuff. The code looks horribly mysterious otherwise. You can leave those as they are without losing performance, as the compiler folds them up like Kleenexes.
Update:
As the horror of X-mas approaches, I'll be on the road for a while. I hope you'll find this answer and code in time to make good use of it. In support of which effort I'll throw in my little test program. It doesn't directly check stuff but prints out the results of conversions in all significant ways and allows you to check them by eye and hand. I fiddled with my code (small tweaks once I got the basic idea down) until everything looked OK there. You may want to test more mechanically and exhaustively.
package tequilaguy;
public class ConverterHarness {
// private static void runOldEncoder() {
// for (char a='A'; a<='Z'; a++) {
// for (char b='A'; b<='Z'; b++) {
// for (char c='A'; c<='Z'; c++) {
// String str = new String(new char[] { a, b, c});
// byte[] enc = OldConverter.encode(str);
// System.out.format("%s : %02X%02X%n", str, enc[0], enc[1]);
// }
// }
// }
// }
private static void testNewConverter() {
for (char a='A'; a<='Z'; a++) {
for (char b='A'; b<='Z'; b++) {
for (char c='A'; c<='Z'; c++) {
String str = new String(new char[] { a, b, c});
byte[] oldEnc = OldConverter.encode(str);
byte[] newEnc = NewConverter.encode(str);
byte[] newEnc2 = NewConverter.encode("@" + str);
System.out.format("%s : %02X%02X %02X%02X %02X%02X %s %s %n",
str, oldEnc[0], oldEnc[1], newEnc[0], newEnc[1], newEnc2[0], newEnc2[1],
NewConverter.decode(newEnc), NewConverter.decode(newEnc2));
}
}
}
}
public static void main(String[] args) {
testNewConverter();
}
}