views:

94

answers:

1

I'm trying to do a simple caesarian shift on a binary string, and it needs to be reversable. I've done this with this method..

public static String cShift(String ptxt, int addFactor)
    {
        String ascii = "";
        for (int i = 0; i < ptxt.length(); i+=8)
        {
            int character = Integer.parseInt(ptxt.substring(i, i+8), 2);
            byte sum = (byte) (character + addFactor);
            ascii += (char)sum;
        }
        String returnToBinary = convertToBinary(ascii);
        return returnToBinary;
    }

This works fine in some cases. However, I think when it rolls over being representable by one byte it's irreversable. On the test string "test!22*F ", with an addFactor of 12, the string becomes irreversible. Why is that and how can I stop it?

edit: For clarification sake, the test string is converted to binary before being passed in. Here is convertToBinary

public static String convertToBinary(String str)
    {
        char [] array = str.toCharArray();
        String binaryToBeReturned = "";

        for (int i = 0; i < str.length(); i++)
        {
            String binary = Integer.toBinaryString((int)array[i]);
            binary = padZeroes(binary);
            binaryToBeReturned += binary;
        }
        return binaryToBeReturned;
    }

When I run this with a cShift of 12, followed by a cShift of -12 to reverse, I get this...

01110100011001010111001101110100001000010011001000110010010001100010101000100000
111111111000000001110001011111111111111110000000001011010011111000111110010100100011011000101100
ÿ?qÿ?->>R6,
ÿótesÿót!22F*

The first string is just converting the test string to binary. The second string is the result of the cShift in binary. The third string is the result of converting this to ascii, and the fourth string is the result of reversing with -12 on cShift and converting to ascii.

It's pretty clear to me that somehow there are extra bits being added from the roll over and I'm not totally sure how to deal with it. Thanks.

+1  A: 

You need to mask the byte when widening to char, because otherwise the sign bit will be extended.

ascii += (char)(sum & 0xFF)

This masking pattern applies when widening a signed numeric type if you don't want the sign extension.

anInt = aByte & 0xFF;
anInt = aShort & 0xFFFF;
aLong = anInt & 0xFFFFFFFFL; // notice the L

Here's an example to illustrate:

byte b = -1; // 0xFF
char ch = (char) b; // 0xFFFF
int i = ch;
System.out.println(i); // prints "65535", which is 0xFFFF

byte b = -1; // 0xFF
char ch = (char) (b & 0xFF); // 0xFF
int i = ch;
System.out.println(i); // prints "255", which is 0xFF

There is a lesson to be had here. If you've read Java Puzzlers, you'll see a few that revolves around sign extension hooplas. This puzzle from the book is essentially the same as the one I had above, but perhaps more confusing:

// Java Puzzlers, Puzzle 6: Multicast
System.out.println((int) (char) (byte) -1); // prints 65535

There are two ways to remedy this:

  • Avoid working with byte and short. You rarely need to.
  • If you are working with them, always be wary of the need to mask.
  • byte to char is always tricky because:
    • Although char is wider than byte...
    • char is unsigned while byte is!!!
    • Therefore, it's not a straightforward widening conversion, but a widening-narrowing conversion!

JLS 5.1.4 Widening and Narrowing Primitive Conversions

The following conversion combines both widening and narrowing primitive conversions:

  • byte to char.

First, the byte is converted to an int via widening primitive conversion, and then the resulting int is converted to a char by narrowing primitive conversion.


Additional references

polygenelubricants
This is true, Terri, are you sure you're doing the above function when converting back?
Marcus Adams
I don't think there's a different function for converting back; if he `cShift` with `N`, then he later just `cShift` with `-N` to convert back.
polygenelubricants
Awesome and greatly informative. Thanks so much
Terri
Ah - I totally forgot that `char` is 2 bytes in Java. Nice catch.
danben