I am using the Python ElementTree module to manipulate HTML. I want to emphasize certain words, and my current solution is:
for e in tree.getiterator():
for attr in 'text', 'tail':
words = (getattr(e, attr) or '').split()
change = False
for i, word in enumerate(words):
word = clean_word.sub('', word)
if word.lower() in glossary:
change = True
words[i] = word.replace(word, '<b>' + word + '</b>')
if change:
setattr(e, attr, ' '.join(words))
The above examines the text of each element and emphasizes the important words it finds. However it does this by embedding HTML tags in the text attributes, which is escaped when rendering so that I need to counter with:
html = etree.tostring(tree).replace('>', '>').replace('<', '<')
This makes me uncomfortable so I want to do it properly. However to embed a new Element I would need to shift around the 'text' and 'tail' attributes so that the emphasized text appeared at the same position. And this would be really tricky when iterating as above.
Any advice how to do this properly would be appreciated. I am sure there is something I have missed in the API!