tags:

views:

102

answers:

1

I have a Lua program that is consuming data from an external device. The device is returning malformed XML that looks like:

<element attribute1="value1" attribute2="value2" attribute3=" m "value3" " attribute4="value4" />

In particular some of the fields are user editable and could conceivable contain items that should be escaped, but aren't. Hopefully I can get the code generating these messages fixed, but until then I need a workaround to try to 'do the right thing™'. The messages do seem to come in a fixed format with attributes always in the same order and always present (as far as I can tell) so I could use a very restrictive pattern match like:

string.match(str, 'attribute1="(.*)" attribute2="(.*)" attribute3="(.*)" attribute4="(.*)")

but this seems really icky and will of course break if they decide to change the format (without fixing the problem.

Any suggestions for alternate solutions? I am mainly concerned about finding "'s that need to be turned into &quot;'s. Other XML entities I am not as much worried about.

+3  A: 

Unfortunately, if XML is malformed like that, you can't come up with something that will work in absolutely every case.

What I would do is, first, try to parse it as normal XML. If that fails, fall back to your regex method. That way, when the producer of this XML is fixed, your code will automatically begin to do the right thing.

Jason Creighton
I think for my situation I will be better off just doing the regex first, then falling back if it fails (and spit out a warning to that effect). I am just forwarding on the XML, not consuming it, so I just want to make it nice for the code that actually does later consume it.
Dolphin