tags:

views:

368

answers:

5

Hello,

I'm working on a C++ client/server project where XML strings are passed over a TCP/IP connection. My question is about the proper way to indicate the complete string has been received. I was thinking of null terminated strings or sending the length of the XML string first, so the client/server can tell when a complete string is received.

The client can send GET/SET commands, and the server can reply, as well as send a continuous stream of results. For example: client sends <GET ID="DATA1" /> and server replies <ID="DATA1" VAL="..." /> Or the server can send a continuous stream:

<ID="DATA1" VAL="..." />
<ID="DATA2" VAL="..." />
<ID="DATA3" VAL="..." />
<ID="DATA4" VAL="..." />

In which case the client might receive in a single Read:

<ID="DATA1" VAL="..." /><ID="DATA2" VAL="..." />

Or if a large amount of data were sent it might take multiple Read's to read the whole string.

Using a null termination character seems a bit simplistic (and breaks if string is unicode?) and sending a length value seems awkward as well:

20<ID="DATA1" VAL="1" /> or <length=20><ID="DATA1" VAL="1" />

This must have been solved for TX/RX of HTML files, I just can't seem to figure it out.

I'm using MFC C++ (legacy code) for the server and .Net C++/CLI or C# for the client.

Any help is greatly appreciated!

A: 

Using a zero-byte is the right approach. it should (at least afaik) not break anything in respect of unicode or other encoding and gives you definitely more flexibility than any length byte/long.

Niko
+5  A: 

Your examples aren't actually well-formed XML, which may be part of your problem. If you're going to the trouble of using XML, you may as well use well-formed XML, which has rules for node termination, i.e:

<data id="DATA1" val="..." />

or

You can then use a SAX parser for the stream, which will give you events as nodes and attributes are parsed.

I would then implement your two types of commands like this:

// individual commands
<get id="data_1"/>

// multiple commands
<multi>
  <data id="DATA1"/>
  <data id="DATA2"/>
  ...
</multi>
Greg Campbell
+1 but to be pedantic, you should have written "well-formed XML". "valid XML" means that the XML conforms to a schema, which is something very different: http://en.wikipedia.org/wiki/XML#Well-formedness_and_error-handling
Wim Coenen
Good point - I'll change it.
Greg Campbell
I agree with this - the most logical way to do this is to extend your XML schema, so that a complete request is delimted by `<request req_id=NNNN></request>` and a reply by `<reply req_id=NNNN></reply>`.
caf
A: 

There are three ways I can think of:

  • Describe the length out of band: This could be a little like an HTTP header: CR deliminate a length in ascii, then all following bytes are counted in the length.
  • Null terminate the string. The Null char is unique.
  • CR or LF terminate the node and a line based protocol can read the XML.

As mentioned elsewhere, make sure your XML conforms to standards so that either side can be swapped out and then old code won't have to be tweaked to conform.

quamrana
+1  A: 

I see two options that make a lot of sense, that I've used before:

1- Just send it, and don't terminate the XML. If the XML is valid, it'll have only a single root node. You don't have to terminate it, since the client can parse it until it discovers that it has a complete XML file.

2- use "Pascal" style strings. I find this really easy, since the read can be done all at once, and it makes all the rest of the problems non-existant. Basically, Prepend your 'string' document with an integer that is the number of bytes to be sent. I do this particularly when dealing with TCP, since I can fetch out what I call "packets" or groups of complete data all at once.

Erich
A: 

I like the idea of simple CRLF delimiting, seems simplest. From the link provided would this work? (with CRLF == two bytes 1013)

Send:

   <GET ID="DATA1" />CRLF

Reply:

   <ID="DATA1" VAL="3" />CRLF
   <ID="DATA1" VAL="2" />CRLF
   <ID="DATA1" VAL="1" />CRLF
   ...

As answer 2 mentioned, an XML reply with multiple lines may occur. Might this cause problems with a CRLF at each line, rather than the end of the response? Can't CRLF naturally occur within a multi-line XML string?

Reply:

   <multi>CRLF
     <data id="DATA1"/>CRLF
     <data id="DATA2"/>CRLF
   </multi>CRLF
Brian
Ok, from the XML spec it looks like line endings must only be LF, and if CRLF or CR are found they are converted to just LF:http://www.w3.org/TR/REC-xml/#sec-line-endsSo using CRLF as the XML string packet delimiter looks like it should work. I will give that a try.Thanks for your help.
Brian