tags:

views:

861

answers:

3

I need to have following attribute value in my XML node:

CommandLine="copy $(TargetPath) ..\..\
echo dummy > dummy.txt"

Actually this is part of a .vcproj file generated in VS2008. 
&#x0A means line break, as there should be 2 separate commands.

I'm using Python 2.5 with minidom to parse XML - but unfortunately I don't know how to store sequences like 
, the best thing i can get is &amp#x0D;.

How can I store exactly 
?

UPD : Exactly speaking i have to store not &, but \r\n sequence in form of &#x0A

A: 

You should try storing the actual characters (ASCII 13 and ASCII 10) in the attribute value, instead of their already-escaped counterparts.


EDIT: It looks like minidom does not handle newlines in attribute values correctly.

Even though a literal line break in an attribute value is allowed, but it will face normalization upon document parsing, at which point it is converted to a space.

I filed a bug in this regard: http://bugs.python.org/issue5752

Tomalak
i should have my escape sequense in output - that is exatly what VS2008 do.
The DOM will take care of character escaping according to XML rules. Don't bother about escape sequences, just store the *data* you wish to store.
Tomalak
i understand that DOM cares about escaping. But how to tell to the DOM that i want to store \r\n sequence in escape style ?
Why do you want to? :-) You could have a look at the configuration options for your DOM implementation (or if there is a more configurable one). Maybe there is a way to change the output behavior.
Tomalak
A: 

An ampersand is a special character in XML and as such most xml parsers require valid xml in order to function. Let minidom escape the ampersand for you (really it should already be escaped) and then when you need to display the escaped value, unescape it.

apphacker
Right, so convert
apphacker
I'm saying after you get the value from the dom!
apphacker
+1  A: 

I'm using Python 2.5 with minidom to parse XML - but unfortunately I don't know how to store sequences like

Well, you can't specify that you want hex escapes specifically, but according to the DOM LS standard, implementations should change \r\n in attribute values to character references automatically.

Unfortunately, minidom doesn't:

>>> from xml.dom import minidom
>>> document= minidom.parseString('<a/>')
>>> document.documentElement.setAttribute('a', 'a\r\nb')
>>> document.toxml()
u'<?xml version="1.0" ?><a a="a\r\nb"/>'

This is a bug in minidom. Try the same in another DOM (eg. pxdom):

>>> import pxdom
>>> document= pxdom.parseString('<a/>')
>>> document.documentElement.setAttribute('a', 'a\r\nb')
>>> document.pxdomContent
u'<?xml version="1.0" ?><a a="a&#13;&#10;b"/>'
bobince
I've filed a bug report here: http://bugs.python.org/issue5752. Maybe they do something about it.
Tomalak
thanks, half a day spent for such a small thing )