tags:

views:

69

answers:

3

Have an envelope that I'm passing through a socket. Like

<task>
<doc>
This is the contents of a file
</doc>
</task>

Works great with text docs using a pattern like "<doc>(.*?)</doc>", Pattern.DOTALL but put the contents of a word doc in there and can't get it out. Any Ideas? Jim

A: 

http://www.xml.com/pub/a/98/07/binary/binary.html Good link on encoding binary in xml

Jim Jones
A: 

You mentioned that you're sending data over a socket so you're free to use whatever protocol (even if you event one!) you want. I think I'd do something like this:

Send the following over the socket:

command     : 1 byte (command, enum, let '1' signifiy add_task)
header_size : 4 byte (1 int, size of header. Header is a XML snippet of meta data like doc_name)
doc_size    : 4 byte (1 int, size of raw document )
header_data : header_size bytes of data, interpret as an xml string
doc_data    : doc_size bytes, interpret as your raw data type

Of course there are many higher level protocols that can probably handle this and much more. At least this way you're avoiding weird escape sequences, base64, regex, and other parts that will get you into trouble.

basszero
+1  A: 

Encode the Word Doc in base64 and then put it into the xml wrapper.

Apache commons offers a decent encoder/decoder: http://commons.apache.org/codec/

Matthew Flynn