Hi, I'm currently writing a very small Java program to implement a one-time-pad, where the pad (or key) itself is generated as a series of bytes using a SecureRandom object, which is seeded using a simple string with the SHA-512 algorithm.
Generating the one-time-pad hasn't caused any problems, and if I supply the same seed string each time, as expected I get the same sequence of psuedo-random numbers, making the decryption process possible as long as the person decrypting has the seed string used to encrypt.
When I try to encrypt a file, the program reads in the data 64 chars at a time (except for the end of file, which is generally an odd number), and generates 64 bytes (or matching amount) of psuedo random bytes. XOR is performed between the elements of both arrays, the resulting char array containing the cipher characters is written to file, and the process repeats until all text in the file has been read.
Now, because Java treats all primitives as signed numbers (the data type byte ranges from -128 to 127, not 0 to 255) this means that the XOR operation can (and does) result in some negative values (-128 to -1). It seems that Java does not recognise these values as valid ASCII, and simply writes a ? (question mark) to the file for any negative values. When it comes to reading from the file to decrypt the cipher text, the negative value that resulted in the ? to be written to file is lost, replaced with 63, the valid ASCII code for a question mark.
This means that XORing this value is useless, without the original value there is no way to produce the plaintext. Incidentally, if I reproduce the behaviour of encrypting some data and then decrypting the data immediately after, in the same program run, and printing status along the way, there are no problems. Only if the data is written to file is the information lost.
I should also mention that I did try adding 128 to each encryption XOR result, and then subtracting it before performing the decryption XOR (to put each value in a valid ASCII range), but the ? problem still showed up because there are 31 ASCII codes from 128 to 159 that I'm unable to read and appear as ?
I've been banging my head off the wall on this for a while now, any help is appreciated. Cheers.