What is the best way to verify/test that a text string is serialized to a byte array with a certain encoding?

In my case, I want to verify that an XML structure is serialized to a byte array with the UTF-8 encoding which is of variable character length. As an example, my current ugly procedure is to inject a character known to require two bytes into the structure before serializing, then replacing the two-byte character with an ASCII character and comparing the serialized array lengths. This should yield two serialized arrays where the array containing the two-byte characters should have length +1.

Plus if the solution is elegant for Java. I can't think of any elegant way to seek for a byte sequence in a byte array. (Could be used to seek for a known byte sequence representing the desired character representation in UTF-8.)


Perhaps you could deserialise the byte array using a known encoding and ensure that (a) it doesn't throw any exceptions, and (b) deserialises to the original string. It seems that from your description of the scenario, you may not have the original string readily available. Might there be a way to create it?

Greg Hewgill

That's good. You're right, I don't have the original string, since I'm testing a module that creates an XML document as a byte array. I didn't think about deserializing to a String with an expected encoding. That will do the trick. Thanks!