views:

173

answers:

4

I need to parse a configuration file which looks like this (simplified):

<config>
<links>
<link name="Link1" id="1">
 <encapsulation>
  <mode>ipsec</mode>
 </encapsulation>
</link>
<link name="Link2" id="2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>

My goal is to be able to change parameters specific to a particular link, but I'm having trouble getting substitution to work correctly. I have a regex that can isolate a parameter value on a specific link, where the value is contained in capture group 1:

link_id = r'id="1"'
parameter = 'mode'
link_regex = '<link [\w\W]+ %s>[\w\W]*[\w\W]*<%s>([\w\W]*)</%s>[\w\W]*</link>' \
% (link_id, parameter, parameter)

Thus,

print re.search(final_regex, f_read).group(1)

prints ipsec

The examples in the regex howto all seem to assume that one wants to use the capture group in the replacement, but what I need to do is replace the capture group itself (e.g. change the Link1 mode from ipsec to udp).

+1  A: 

not sure i'd do it that way, but the quickest way would be to shift the captures:

([\w\W][\w\W]<%s>)[\w\W]([\w\W])' and replace with group1 +mode+group2

Luke Schafer
Thanks, that gave me the hint I needed.
Rob Carr
+1  A: 

Supposing that your link_regex is correct, you can add parenthesis like this:

(<link [\w\W]+ %s>[\w\W]*[\w\W]*<%s>)([\w\W]*)(</%s>[\w\W]*</link>)

and then you could do:

p = re.compile(link_regex)
replacement = 'foo'
print p.sub(r'\g<1>' + replacement + r'\g<3>' , f_read)
erik
That is what I ended up doing.
Rob Carr
+6  A: 

I have to give you the obligatory: "don't use regular expressions to do this."

Check out how very easily awesome it is to do this with BeautifulSoup, for example:

>>> from BeautifulSoup import BeautifulStoneSoup
>>> html = """
... <config>
... <links>
... <link name="Link1" id="1">
...  <encapsulation>
...   <mode>ipsec</mode>
...  </encapsulation>
... </link>
... <link name="Link2" id="2">
...  <encapsulation>
...   <mode>udp</mode>
...  </encapsulation>
... </link>
... </links>
... </config>
... """
>>> soup = BeautifulStoneSoup(html)
>>> soup.find('link', id=1)
<link name="Link1" id="1">
<encapsulation>
<mode>ipsec</mode>
</encapsulation>
</link>
>>> soup.find('link', id=1).mode.contents[0].replaceWith('whatever')
>>> soup.find('link', id=1)
<link name="Link1" id="1">
<encapsulation>
<mode>whatever</mode>
</encapsulation>
</link>

Looking at your regular expression I can't really tell if this is exactly what you wanted to do, but whatever it is you want to do, using a library like BeautifulSoup is much, much, better than trying to patch a regular expression together. I highly recommend going this route if possible.

Paolo Bergantino
or if it is standard xml, he can just use some xml lib like elementTree to modify it, I really don't see use of reg ex here.
Anurag Uniyal
+1  A: 

here is a approach using ElementTree

import xml.etree.cElementTree as ET

s = """<config>
<links>
<link name="Link1" id="1">
 <encapsulation>
  <mode>ipsec</mode>
 </encapsulation>
</link>
<link name="Link2" id="2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>
</config>
"""
configElement = ET.fromstring(s)

for modeElement in configElement.findall("*/*/*/mode"):
    modeElement.text = "udp"

print ET.tostring(configElement)

It will change all mode elements to udp, this is the output

<config>
<links>
<link id="1" name="Link1">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
<link id="2" name="Link2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>
</config>
Anurag Uniyal