views:

165

answers:

1

I've been working on my own django based blog (like everyone, I know) to sharpen up my python, and I thought added some syntax highlight would be pretty great. I looked at some of the snippets out there and decided to combine a few and write my own syntax highlighting template filter using Beautiful Soup and Pygments. It looks like this:

from django import template
from BeautifulSoup import BeautifulSoup
import pygments
import pygments.lexers as lexers
import pygments.formatters as formatters

register = template.Library()

@register.filter(name='pygmentize')
def pygmentize(value):
    try:
        formatter = formatters.HtmlFormatter(style='trac')
        tree = BeautifulSoup(value)
        for code in tree.findAll('code'):
            if not code['class']: code['class'] = 'text'
            lexer = lexers.get_lexer_by_name(code['class'])
            new_content = pygments.highlight(code.contents[0], lexer, formatter)
            new_content += u"<style>%s</style>" % formatter.get_style_defs('.highlight')
            code.replaceWith ( "%s\n" % new_content )
        content = str(tree)
        return content
    except KeyError:
        return value

It looks for a code block like this and highlights and ads the relevant styles:

<code class="python">
    print "Hello World"
</code>

This was all working fine until a block of code I was included had some html in it. Now, I know all the html I need, so I write my blog posts directly in it and when rendering to the template, just mark the post body as safe:

{{ post.body|pygmentize|safe }}

This approach results in any html in a code block just rendering as html (ie, not showing up). I've been playing around with using the django escape function on the code extracted from body by my filter, but I can never quite seem to get it right. I think my understanding of the content escaping just isn't complete enough. I've also tried writing the escaped version in the post body (eg <), but it just comes out as text.

What is the best way to mark the html for display? Am I going about this all wrong?

Thanks.

+1  A: 

I've finally found some time to figure it out. When beautiful soup pulls in the content and it contains a tag, the tag is listed as a sub node of a list. This line is the culprit:

new_content = pygments.highlight(code.contents[0], lexer, formatter)

The [0] cuts off the other part of the code, it isn't being decoded incorrectly. Poor bug spotting on my part. That line needs to be replaced with:

new_content = pygments.highlight(code.decodeContents(), lexer, formatter)

The lessons here are make sure you know what the problem is, and know how your libraries work.

pivotal