Suppose we have a table:
Key|Val|Flag
01 |AAA| Y
02 |BBB| N
...
wrapped into xml this way:
<Data>
<R><F>Key</F><F>Val</F><F>Flag</F></R>
<R><F>01</F><F>AAA</F><F>Y</F></R>
<R><F>02</F><F>BBB</F><F>N</F></R>
...
</Data>
There can be more columns and rows, obviously.
Now I'd like to parse XML back to table using single regex.
I can find all fields with '<F>([\w\d]*)</F>'
, but I need them to be groupped by rows somehow.
I thought about <R>(<F>([\w\d]*)</F>)*</R>
, but Python implementation finds nothing.
Can someone please help to compose regex?
UPDATE Some context of the question.
I'm aware about plenty of XML parsing libraries, but unfortunately my environment is limited to standard libraries. Anyway thanks to everyone who have warned not to use regexes for XML parsing.
And I needed some quick and dirty solution, therefore I decided to start with regexes and switch to parsing later.
So far I have the code:
...
row_p = r'<R>(.*?)</R>'
field_p = r'<F>(.*?)</F>'
table = ''
for row in re.finditer(row_p, xml):
table += '|'.join(re.findall(field_p, row.group(1))) + '\n'
...
It works for small datasets (about 10'000 rows) but fails for tables larger 500'000 rows.
Maybe I'll do some investigation why it fails, but next step I'm going to take - switch to some standard XML parser. ElementTree is the first candidate.