views:

153

answers:

1

I'm using XML to send project information between applications. One of the pieces of information is the project description. So I have:

<ProjectDescription>Test &amp; spaces around&amp;some  &amp;  amps!</ProjectDescription>

Or: "Test & spaces around&some & amps!" <-- GOOD!

When I then use Expat to parse it, my data handler gets just parts of the entire string at a time. "Test", then "&", then "spaces around", the next "&", etc, etc. When I then try to reconstruct the original string, all the spacing around the &'s is dropped because the data handler never gets to see them. When I then re-write the XML I get:

<ProjectDescription>Test&amp;spaces around&amp;some&amp;amps!</ProjectDescription>

Or: "Test&spaces around&some&amps!" <-- BAD!

Is this a known problem with existing workarounds? Is there some setting I can give Expat to control its behavior around escaped symbols?

My attempts at Googling an answer have met with dismal failure.

EDIT: In response to a question in the comments: I have my own handler, which I register with the parser:

parser=XML_ParserCreate(NULL); 
XML_SetUserData(parser,&depth);
XML_SetElementHandler(parser,startElement,endElement); 
XML_SetCharacterDataHandler(parser,dataHandler);

The handler is declared as follows:

static void dataHandler(void *userData,const XML_Char *s,int l)

And then "s" contains the data in the element. Without any & stuff, it's the entire string between the open and close tags, in the case of "a string with spaces".

A: 

I have just run a test with my own library that uses expat. My handler looks like this, with debug statements to display what is going on:

void CharDataHandler( void * parser, 
                       const XML_Char *s,
                       int len ) {
    std::cerr << "[" << s << "]\n";
    std::cerr << len << "\n";
    // my own processing here - not important 
}

I don't see the behaviour you are talking about. For the input data:

XXX &amp; YYY

I get three events with the char * and length data set as folows:

char * = "XXX &amp; YYY"
length = 4

char * = "&"
length = 1

char * = " YYY"
length = 4

So the spaces are retained. As far as I know I am not using any specal settings. What version & platform of Expat are you using?

anon
I just updated to expat-2.0.1 from the tar.gz I got off the website this morning, hoping it would be fixed in this release. Platform is Win32.
TheWalruss
Ok, some code that's particular to my dataHandler strips the whitespace. Thanks!
TheWalruss