views:

89

answers:

3

I'm trying to find all strings of the format {{rdex|001|001|Bulbasaur|2|Grass|Poison}} in a large text file, and then extract the substrings corresponding to the first 001 and to Bulbasaur, perhaps as a tuple.

I'm assuming regex with capturing groups can be used for both; could anybody tell me the appropriate regex to use in Python 3.1 as well as a possible code outline? I'm a regex noob.

Thanks!

+1  A: 
re.match('^{{[^|]+\|([^|]+)\|[^|]+\|([^|]+)\|[^|]+\|[^|]+\|[^|]+\}}$', S).groups()
Ignacio Vazquez-Abrams
+1  A: 
import re
text="""{{rdex|001|001|Bulbasaur|2|Grass|Poison}}"""
re.findall("\{\{[^|]+\|(\d+)\|\d+\|([^|]+)",text)
[('001', 'Bulbasaur')]
S.Mark
that is some fly regex right there. might i ask, where did you learn it? was it from a book/internet tutorial/divine gift? many thanks!
Beau Martínez
MSDN's regular expressions syntax page was my first impression on regex http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx
S.Mark
A: 
line="{{rdex|001|001|Bulbasaur|2|Grass|Poison}}"
s=line.find("{{")
e=line.find("}}")
if s != -1 and e != -1:
    sub=line[s+2:e].split("|")
    print sub[1],sub[3]

output

$ ./python.py
001 Bulbasaur
ghostdog74