ansaurus

Question

Answer 1

+2 A:

Mandatory links:

Use an XML parser. lxml is very good and even provides (among other XML-related thingies) XPath - if you got a fetish with oneliners, I'm sure there is an XPath oneliner to extract these elements ;)

delnan 2010-10-14 12:43:18

Thanks for the links. It's good to have weighty arguments to stay away from the dark side at the fingertips.

z4y4ts 2010-10-15 16:54:32

Answer 2

A:

~~If this question is tagged with Perl, I can post a solution + code for you, but since this is python.~~

Anyway, I suggest you load the xml file, and read it line by line. Loop each line until the end of the file and find all fields within that line. As far as I know matches in python are stored in an array. There you have it. Wish I can show you with code but this is just the main idea:

load file
foreach line in <file>
    if regex.match('<F>([\w\d]*)</F>', line)
        print matches[1] . '|' . matches[2] . '|' . matches[3] . "\n"
end loop

DISCLAIMER: The above code is just a scratch

Oh by the way, if possible, use an XML parser instead.

Ruel 2010-10-14 12:46:41

Answer 3

A:

import libxml2

txt = '\n<Data>\n  <R><F>Key</F><F>Val</F><F>Flag</F></R>\n  <R><F>01</F><F>AAA</F><F>Y</F></R>\n  <R><F>02</F><F>BBB</F><F>N</F></R>\n</Data>\n'

rows = []
for elem in libxml2.parseDoc(txt):
    if elem.name == 'R':
        curRow = []
        rows.append(curRow)
    elif elem.name == 'F':
        curRow.append(elem.get_content())

returns:

rows = [['Key', 'Val', 'Flag'], ['01', 'AAA', 'Y'], ['02', 'BBB', 'N']]

eumiro 2010-10-14 12:56:42

Answer 4

A:

lxml is a Pythonic binding for the libxml2 and libxslt libraries. It is unique in that it combines the speed and feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API.

ecounysis 2010-10-15 16:55:37

lxml is great, but unfortunately my environment is limited to standard library only, anyway thanks.

z4y4ts 2010-10-15 16:58:42

ansaurus

tags:

views:

answers:

regex to parse tables wrapped into xml

related questions