views:

657

answers:

2

Is it possible to preserve whitespace inside tags?

I am accessing XML nodes (containing XHTML content) in an XPathDocument using a XPathNodeIterator.

Some of the tags in the nodes are not "strict" XHTML (and this is allowed in the final output of the tool). Some nodes contain image tags without the trailing space.

<img src="filename.png" alt="description"/>

When i store the resulting nodes they get nicely formatted with the trailing space.

<img src="filename.png" alt="description" />

Is it possible to get the node contents, preserving the in-tag spacing (in this case not have the space)? I was thinking about something similar to PreserveWhitespace.

A simplified sample of the code used

xmlDoc = New XPathDocument(fileIn, xmlSpace.Preserve)
xmlNav = xmlDoc.CreateNavigator()
Dim xmlNode As XPathNodeIterator
Dim ns As XmlNamespaceManager = new XmlNamespaceManager(xmlNav.NameTable)

xmlNode = xmlNav.Select("/export/contents[target[@translate='True']]")
While xmlNode.MoveNext()
  target = xmlNode.Current.selectSingleNode("target").InnerXML
  ' ... '
End While


Some background: As Marc pointed out there is no difference in the meaning of the resulting XML with regard to the non-significant whitespace inside the tags (or the tag order for that matter).

The main problem i encounter is that the data comes from a CMS system that handles both new and legacy content. The content creation process just recently moved to XML/XHTML so there is still older non strict XHTML content in the system.

The QA tools used are still mainly text based and build for HTML and are run by another department (the QA process will need to be adjusted/updated). This is why i would like to keep tags as close to the original format as possible for now.


As a temporary work-around i added a few regular expressions (comparing new and previous versions of the nodes) to search for and fix the "differences" introduced by parsing the XML with .NET

+1  A: 

I'm not aware of any parser / xml tool / etc (in .NET at least) that would distinguish between those two (insignificant whitespace). In terms of meaning, they are identical - the same as they are identical to:

<img alt="description" src="filename.png" />
Marc Gravell
yeah the end result is exactly the same (the final XHTML is also displayed identical). Problem is that a simple text compare shows one space difference. I agree that there is no difference but the requirement says spacing in tags needs to be identical ...
barry
Then the requirement is ignoring the very nature of xml...
Marc Gravell
I am doing my best to convince the client that in every aspect there is no risk involved in the result. And that technically they end up with "cleaner" content.
barry
@barry - Good luck with that. :P
SirDemon
A: 

Post-process the file with a regex s/[ ][/][>]/[/][>]/g.

Be aware that if you are generating XHTML, replaceing <br /> with <br/> may break some downlevel browsers. <br /> is seen as an HTML tag with unknown attribute "/", which is then ignored. <br/> is seen as unknown HTML tag "br/".

Thanks. At the moment is do an automated tag-by-tag compare of the "processed" tags against the sources in the CMS and if needed "fix" the tags (using regular expressions). Since the targets are currently not strict XHTML and there are some minimum requirements for the browser to be used i luckily don't have to worry about down-level compatibility.
barry