views:

399

answers:

2

how to encode/decode escape sequence character '\x13' in python into a character that is valid in a RSS or XML.

use case is, I am getting data from arbitrary sources and making a RSS feed for that data. The data source sometimes have escape sequence character which is breaking my RSS feed.

So how can I sanitize the input data with escape sequence character.

+2  A: 

\x13 (ASCII 19, ‘DC3’) can't be escaped; it is invalid in XML 1.0, period. You can include one, encoded as &#19; or &#x13; in XML 1.1, but then you have to include the <?xml version="1.1"?> declaration and many tools won't like it.

I've no idea why that character would be included in your data, but the way forward is probably to completely remove control codes. For example:

re.sub('[\x00-\x08\x0B-\x1F]', '', s)

For some kinds of escape sequence (eg. ANSI colour codes) you might get stray (non-control) characters still in there, in which case you'd probably want a custom parser for that particular format.

bobince