ansaurus

Question

Adding anchors to h2 in text using python and regexp

Answer 1

A:

Not sure if I understand correctly, but is placing the author as the name attribute sufficient? Maybe you could use (as long as the author name doesn't contain invalid chars for an attribute):

regex = '(?P<name><h2>(.*?)</h2>)'
print re.sub(regex, "<a name='\g<2>'/>"+r"\g<name>", text)

If you need a more advanced substitution method, parsing the author name or looking up some sort of related id, you could define a replacement function (see re substitute doc):

def name_substitution(matchobj):
    name = matchobj.group(2)
    # do some processing on name here ...
    name = name.replace(' ', '_')
    return "<a name='%s'>%s</a>" % (name, matchobj.group(0))

print re.sub(regex, substitution, text)

catchmeifyoutry 2010-05-25 15:37:41

Answer 2

+1 A:

You can take advantage of the fact that the second argument to re.sub can be a function to do pretty much anything you'd like. Here's an example that will slugify the text inside the <h2> element:

regex = '(?P<name><h2>(.*?)</h2>)' # Note the extra group inside the <h2>

def slugify(s):
    return s.replace(' ', '-') # bare-bones slugify

def anchorize(matchobj):
    return '<a name="%s"/>%s' % (slugify(matchob.group(2)), matchobj.group(1))

text = re.sub(regex, anchorize, text)

(That slugify function could obviously use some work.)

You could also implement a counter with a version of anchorize that used a global counter or, better yet, a class that kept track of its own counter and implemented the special __call__ method.

Will McCutchen 2010-05-25 15:38:22

ansaurus

tags:

views:

answers:

Adding anchors to h2 in text using python and regexp

related questions