ansaurus

Question

Match database output (balanced parentheses, table & rows structure) and output as a list?

Answer 1

+3 A:

Parsing recursive structures with regex is a pain because you have to keep state.

Instead, use pyparsing or some other real parser.

Some folks like PLY because it follows the traditional Lex/Yacc architecture.

nosklo 2009-08-24 21:56:58

Answer 2

A:

This excellent page lists many parsers available to Python programmers. Regexes are unsuitable for "balanced parentheses" matching, but any of the third party packages reviewed on that page will serve you well.

Alex Martelli 2009-08-24 22:40:29

Answer 3

A:

This regex:

Row\[[\s]*C_ID\[[\W]*Data:([0-9.]*)[\S\W]*F_ID\[[\S\W]*Data:([0-9.]*)[\S\W]*NAME\[[\S\W]*Data:([\w ]*)[\S ]*

for the first row will match:

$1=12345.0 $2=17660 $3=Mike Jones

Then you can use something like this:

{'C_ID': $1, 'F_ID': $2, 'NAME': '$3'}

to produce:

{'C_ID': 12345.0, 'F_ID': 17660, 'NAME': 'Mike Jones'}

So you need to iterate through your input until it stops matching your rows... Does it make sense?

DmitryK 2009-08-24 23:57:05

btw, an alternative solution can be to convert the whole lot to XML and use XSLT to construct output you need.

DmitryK 2009-08-24 23:58:01

That will work... kind of.What if I wanted to execute that regex for each row so that it just matched C_ID as $1 and 12345.0 as $2, and then repeat for the next row (with $1 and $2 holding the variable name and value respectively)?

Crazy Serb 2009-08-25 06:06:18

then you will need 3 different regex for C_ID, F_ID and NAME respectively. I think you will be better of parsing your input on a per row basis.

DmitryK 2009-08-25 06:23:43

Answer 4

A:

There really isn't a lot of unpredictable nesting going on here, so you could do this with regex's. But pyparsing is my tool of choice, so here is my solution:

from pyparsing import *

LBRACK,RBRACK,COLON = map(Suppress,"[]:")
ident = Word(alphas, alphanums+"_")
datatype = oneOf("Double Long String Boolean")

# define expressions for pieces of attribute definitions
data = LBRACK + "Data" + COLON + SkipTo(RBRACK)("contents") + RBRACK
sec = LBRACK + "Sec" + COLON + SkipTo(RBRACK)("contents") + RBRACK
type = LBRACK + "Type" + COLON + datatype("datatype") + RBRACK

# define entire attribute definition, giving each piece its own results name
attrDef = Group(ident("key") + data("data") + sec("sec") + type("type"))

# now a row is just a "Row[" and one or more attrDef's and "]"
rowDef = Group("Row" + LBRACK + Group(OneOrMore(attrDef))("attrs") + RBRACK)

# this method will process each row, and convert the key and data fields
# to addressable results names
def assignAttrs(tokens):
    ret = ParseResults(tokens.asList())
    for attr in tokens[0].attrs:
        # use datatype mapped to function to convert data at parse time
        value = {
            'Double' : float,
            'Long' : int,
            'String' : str,
            'Boolean' : bool,
            }[attr.type.datatype](attr.data.contents)
        ret[attr.key] = value
    # replace parse results created by pyparsing with our own named results
    tokens[0] = ret
rowDef.setParseAction(assignAttrs)

# a TABLE is just "Table[", one or more rows and "]"
tableDef = "Table" + LBRACK + OneOrMore(rowDef)("rows") + RBRACK

test = """
Table[    
  Row[
    C_ID[Data:12345.0][Sec:12345.0][Type:Double]
    F_ID[Data:17660][Sec:17660][Type:Long]
    NAME[Data:Mike Jones][Sec:Mike Jones][Type:String]
  ]    
  Row[
    C_ID[Data:2560.0][Sec:2560.0][Type:Double] 
    NAME[Data:Casey Jones][Sec:Mike Jones][Type:String]
  ]
]"""

# now parse table, and access each row and its defined attributes
results = tableDef.parseString(test)
for row in results.rows:
    print row.dump()
    print row.NAME, row.C_ID
    print

prints:

[[[['C_ID', 'Data', '12345.0', 'Sec', '12345.0', 'Type', 'Double'],...
- C_ID: 12345.0
- F_ID: 17660
- NAME: Mike Jones
Mike Jones 12345.0

[[[['C_ID', 'Data', '2560.0', 'Sec', '2560.0', 'Type', 'Double'], ...
- C_ID: 2560.0
- NAME: Casey Jones
Casey Jones 2560.0

The results names assigned in assignAttrs give you access to each of your attributes by name. To see if a name has been omitted, just test "if not row.F_ID:".

Paul McGuire 2009-09-07 16:57:32

ansaurus

tags:

views:

answers:

Match database output (balanced parentheses, table & rows structure) and output as a list?

related questions