views:

252

answers:

3

I'm working on a script for work to extract data from an old template engine schema:

[%price%]
{
$54.99
}
[%/price%]

[%model%]
{
WRT54G
}
[%/model%]

[%brand%]{
LINKSYS
}
[%/brand%]

everything within the [% %] is the key, and everything in the { } is the value. Using Python and regex, I was able to get this far: (?<=[%)(?P\w*?)(?=\%])

which returns ['price', 'model', 'brand']

I'm just having a problem getting it match the bracket data as a value

A: 

It looks like it'd be easier to do with re.Scanner (sadly undocumented) than with a single regular expression.

Devin Jeanpierre
+4  A: 

I agree with Devin that a single regex isn't the best solution. If there do happen to be any strange cases that aren't handled by your regex, there's a real risk that you won't find out.

I'd suggest using a finite state machine approach. Parse the file line by line, first looking for a price-model-brand block, then parse whatever is within the braces. Also, make sure to note if any blocks aren't opened or closed correctly as these are probably malformed.

You should be able to write something like this in python in about 30-40 lines of code.

Dana the Sane
There's even one of those cases on the "brand" tag example that starts the open brace on the same line versus the first two. Regex could work, but the state machine a-la sax parser would work better.
Trey Stout
A: 

just for grins:

import re
RE_kv = re.compile("\[%(.*)%\].*?\n?\s*{\s*(.*)")
matches = re.findall(RE_kv, test, re.M)
for k, v in matches:
    print k, v

output:

price $54.99
model WRT54G
brand LINKSYS

Note I did just enough regex to get the matches to show up, it's not even bounded at the end for the close brace. Use at your own risk.

Trey Stout