views:

46

answers:

1

I've started working on a simple XML pull-parser, and as I've just defuzzed my mind on what's correct syntax in XML with regards to certain characters/sequences, ignorable whitespace and such (thank you, http://www.w3schools.com/xml/xml_elements.asp), I realized that I still don't know squat about what can be sketched up as the following case (which Validome finds well-formed very much; note that I only want to use xml files for data storage, no entities, DTD or Schemas needed):

<bookstore>
   <book id="1">
      <author>Kurt Vonnegut Jr.</author>
      <title>Slapstick</title>
   </book>
We drop a pie here.
   <book id="2">Who cares anyway?
      <author>Stephen King</author>
      <title>The Green Mile</title>
   </book>
And another one here.
   <book id="3">
      <author>Next one</author>
      <title>This time with its own title</title>
   </book>
</bookstore>

"We drop a pie here." and "And another one here." are values of the 'bookstore' element. "Who cares anyway?" is a value related to the second 'book' element.

How are these processed, if at all? Will "We drop a pie here." and "Another one here." be concatenated to form one value for the 'bookstore' element, or are they treated separately, stored somewhere, affecting the outcome of the parsing of the element they belong to, or...?

A: 

Easiest way to go is to parse it with a few standards-compliant parsers and dump the output.

Tahir Akhtar