tags:

views:

82

answers:

2

hi,

I have a byte array (UTF-8 encoded string send as byte array from client). The message should have the following format:

'number' 'timestamp' 'str1' 'str2'

E.g

1 2000-01-31T20:00.00 the 1st str the 2nd str

It is clear that the 'number' and 'timestamp' are easily read from the byte array. The start position of 'str1' can be also figured out. Considering that 'str1' and 'str2' can have any content (any length) in it, what type of delimiter can be used to know when 'str1' ends and 'str2' starts? Or are there any other tricks for parsing something like this.

note1: the message format is provided by me so any solution with a different format/order will do as long as all 4 pieces of info is in the byte array.

note2: I know I could encode str1 so that it doesn't contain my custom delimiter but I would like to avoid the overhead of encoding/decoding the data.

note3: One solution I could think of was to write the length of str1 in front of it when sending the data from client side. E.g 'number' 'timestamp' 'str1length' 'str1' 'str2'

are there any other tricks you can think of?

thanks

+1  A: 

I recommend you do the 3rd option you listed:
number   timestamp   length_of_string1   string1   length_of_string_two   string2

Its probably a bad idea to stick a delimiter between string1 and string2 like "|" or "^]" because then you can no longer have the delimiter in your strings...

Also note that if you're sending a string, if it has spaces its going to be split up. The way to solve this is by doing a quotation-aware string split and escaping the string, surrounding it with "s

ItzWarty
hi ItzWarty, thanks for the reply.you gave the format:number timestamp length_of_string1 string1 length_of_string_two string2.I think I would only need the length_of_string1 and not length_of_string2. As string2 length is from end of string1 till end of byte array. Did you have something else in mind?
Bob
I don't know what @ItzWarty means about not being able to send "|" etc over the wire ...
Stephen C
I've seen some projects where people delimit strings with a pipe "|" or "^]" so they would send string1+"^]"+string2 or string1+"|"+string2 ... but then you can't have the delimiter in your strings. My wording was ... wordy, i've edited my main post and fixed it. Thanks
ItzWarty
A: 

If I had freedom to choose the syntax, I would do one of the following:

  • If there is some Unicode character that is never going to appear in str1 and str2 (call it '|' for the sake of argument), I would concatenate the 4 components with '|' as the separator. Then I would "parse" the string using String.split("\\\\|");

  • If I couldn't be certain that any character I picked was not going to be used in str1 or str2, I'd pick a separator character and an escape character (say '|' and '\\') and use the escape character to escape a literal separator and a literal escape character. Building the message and then parsing it is more effort to code, but it will definitely work.

  • As an third alternative, if both ends were Java I'd consider using Java data streams to encode and decode the data.

Stephen C