tags:

views:

37

answers:

2

I'm trying to add anchors to all h2's in my html, using python. This code will add those anchors, but I need to fill the name of the anchors too.

Any idea if the name can be the number of the match in the loop or a slugified version of the text between the h2 tags?

Here's the code so far:

regex = '(?P<name><h2>.*?</h2>)'
text = re.sub(regex, "<a name=''/>"+r"\g<name>", text)
A: 

Not sure if I understand correctly, but is placing the author as the name attribute sufficient? Maybe you could use (as long as the author name doesn't contain invalid chars for an attribute):

regex = '(?P<name><h2>(.*?)</h2>)'
print re.sub(regex, "<a name='\g<2>'/>"+r"\g<name>", text)

If you need a more advanced substitution method, parsing the author name or looking up some sort of related id, you could define a replacement function (see re substitute doc):

def name_substitution(matchobj):
    name = matchobj.group(2)
    # do some processing on name here ...
    name = name.replace(' ', '_')
    return "<a name='%s'>%s</a>" % (name, matchobj.group(0))

print re.sub(regex, substitution, text)
catchmeifyoutry
+1  A: 

You can take advantage of the fact that the second argument to re.sub can be a function to do pretty much anything you'd like. Here's an example that will slugify the text inside the <h2> element:

regex = '(?P<name><h2>(.*?)</h2>)' # Note the extra group inside the <h2>

def slugify(s):
    return s.replace(' ', '-') # bare-bones slugify

def anchorize(matchobj):
    return '<a name="%s"/>%s' % (slugify(matchob.group(2)), matchobj.group(1))

text = re.sub(regex, anchorize, text)

(That slugify function could obviously use some work.)

You could also implement a counter with a version of anchorize that used a global counter or, better yet, a class that kept track of its own counter and implemented the special __call__ method.

Will McCutchen