views:

752

answers:

4

I'm trying to implement a WikiLink template filter in Django that queries the database model to give different responses depending on Page existence, identical to Wikipedia's red links. The filter does not raise an Error but instead doesn't do anything to the input.

WikiLink is defined as: [[ThisIsAWikiLink | This is the alt text]]

Here's a working example that does not query the database:

from django import template
from django.template.defaultfilters import stringfilter
from sites.wiki.models import Page
import re

register = template.Library()

@register.filter
@stringfilter
def wikilink(value):
    return re.sub(r'\[\[ ?(.*?) ?\| ?(.*?) ?\]\]', r'<a href="/Sites/wiki/\1">\2</a>', value)
wikilink.is_safe = True

The input (value) is a multi-line string, containing HTML and many WikiLinks.

The expected output is substituting [[ThisIsAWikiLink | This is the alt text]] with

  • <a href="/Sites/wiki/ThisIsAWikiLink">This is the alt text</a>

    or if "ThisIsAWikiLink" doesn't exist in the database:

  • <a href="/Sites/wiki/ThisIsAWikiLink/edit" class="redlink">This is the alt text</a>

and returning value.

Here's the non-working code (edited in response to comments/answers):

from django import template
from django.template.defaultfilters import stringfilter
from sites.wiki.models import Page
import re

register = template.Library()

@register.filter
@stringfilter
def wikilink(value):
    m = re.match(r'\[\[ ?(.*?) ?\| ?(.*?) ?\]\]', value)

    if(m):
        page_alias = m.group(2)
        page_title = m.group(3)
        try:
            page = Page.objects.get(alias=page_alias)
            return re.sub(r'(\[\[)(.*)\|(.*)(\]\])', r'<a href="Sites\/wiki\/\2">\3</a>', value)
        except Page.DoesNotExist:
             return re.sub(r'(\[\[)(.*)\|(.*)(\]\])', r'<a href="Sites\/wiki\/\2\/edit" class="redlink">\3</a>', value)
    else:
        return value
wikilink.is_safe = True

What the code needs to do is:

  • extract all the WikiLinks in value
  • query the Page model to see if the page exists
  • substitute all the WikiLinks with normal links, styled dependent on each wikipage existence.
  • return the altered value

The updated question is: What regular expression (method) can return a python List of WikiLinks, which can be altered and used to substitute the original matches (after being altered).

Edit:

I'd like to do something like this:

def wikilink(value):
    regex = re.magic_method(r'\[\[ ?(.*?) ?\| ?(.*?) ?\]\]', value)

    foreach wikilink in regex:
         alias = wikilink.group(0)
         text = wikilink.group(1)

         if(alias exists in Page):
              regex.sub("<a href="+alias+">"+ text +"</a>")
         else:
              regex.sub("<a href="+alias+" class='redlink'>"+ text +"</a>")

    return value
+3  A: 

This is the type of problem that falls quickly to a small set of unit tests.

Pieces of the filter that can be tested in isolation (with a bit of code restructuring):

  • Determining whether or not value contains the pattern you're looking for
  • What string gets generated if there is a matching Page
  • What string gets generated is there isn't a matching Page

That would help you isolate where things are going wrong. You'll probably find that you'll need to rewire the regexps to account for optional spaces around the |.

Also, on first glance it looks like your filter is exploitable. You're claiming the result is safe, but you haven't filtered the alt text for nasties like script tags.

Dave W. Smith
You're absolutely right about the tests. How ever my code doesn't get that far, eg. it doesn't query the database - see current edit.
Hannson
But it does get at least as far as the first test I suggest above, which covers code that happens before you hit the database.
Dave W. Smith
True. I've double checked that the pattern is in value, proofed with the simpler example above. It's the same regex but doesn't query the db. My failure was in the assumption that value would not contain many WikiLinks, eg. that it would get one pattern at a time (don't ask :))
Hannson
I hit that very problem on my first Wiki implementation. When you expect maybe one of something, testing for 0, 1, and 2 can prevent surprises.
Dave W. Smith
+4  A: 

If your string contains other text in addition to the wiki-link, your filter won't work because you are using re.match instead of re.search. re.match matches at the beginning of the string. re.search matches anywhere in the string. See matching vs. searching.

Also, your regex uses the greedy *, so it won't work if one line contains multiple wiki-links. Use *? instead to make it non-greedy:

re.search(r'\[\[(.*?)\|(.*?)\]\]', value)

Edit:

As for tips on how to fix your code, I suggest that you use re.sub with a callback. The advantages are:

  • It works correctly if you have multiple wiki-links in the same line.
  • One pass over the string is enough. You don't need a pass to find wiki-links, and another one to do the replacement.

Here is a sketch of the implmentation:

import re

WIKILINK_RE = re.compile(r'\[\[(.*?)\|(.*?)\]\]')

def wikilink(value):
  def wikilink_sub_callback(match_obj):
    alias = match_obj.group(1).strip()
    text = match_obj.group(2).strip()
    if(alias exists in Page):
      class_attr = ''
    else:
      class_attr = ' class="redlink"'
    return '<a href="%s"%s>%s</a>' % (alias, class_attr, text)

  return WIKILINK_RE.sub(wikilink_sub_callback, value)
Ayman Hourieh
Any tips on how to do the pseudo code?
Hannson
Callback approach is neat!
Matthew Christensen
+1  A: 

Code:

import re

def page_exists(alias):
    if alias == 'ThisIsAWikiLink':
        return True

    return False

def wikilink(value):
    if value == None:
        return None

    for alias, text in re.findall('\[\[\s*(.*?)\s*\|\s*(.*?)\s*\]\]',value):
        if page_exists(alias):
            value = re.sub('\[\[\s*%s\s*\|\s*%s\s*\]\]' % (alias,text), '<a href="/Sites/wiki/%s">%s</a>' % (alias, text),value)            
        else:
            value = re.sub('\[\[\s*%s\s*\|\s*%s\s*\]\]' % (alias,text), '<a href="/Sites/wiki/%s/edit/" class="redtext">%s</a>' % (alias, text), value)

    return value

Sample results:

>>> import wikilink
>>> wikilink.wikilink(None)
>>> wikilink.wikilink('')
''
>>> wikilink.wikilink('Test')
'Test'
>>> wikilink.wikilink('[[ThisIsAWikiLink | This is the alt text]]')
'<a href="/Sites/wiki/ThisIsAWikiLink">This is the alt text</a>'
>>> wikilink.wikilink('[[ThisIsABadWikiLink | This is the alt text]]')
'<a href="/Sites/wiki/ThisIsABadWikiLink/edit/" class="redtext">This is the alt text</a>'
>>> wikilink.wikilink('[[ThisIsAWikiLink | This is the alt text]]\n[[ThisIsAWikiLink | This is another instance]]')
'<a href="/Sites/wiki/ThisIsAWikiLink">This is the alt text</a>\n<a href="/Sites/wiki/ThisIsAWikiLink">This is another instance</a>'
>>> wikilink.wikilink('[[ThisIsAWikiLink | This is the alt text]]\n[[ThisIsAWikiLink | This is another instance]]')

General comments:

  • findall is the magic re function you're looking for
  • Change *page_exists* to run whatever query you want
  • Vulnerable to HTML injection (as mentioned by Dave W. Smith above)
  • Having to recompile the regex on each iteration is inefficient
  • Querying the database each time is inefficient

I think you'd run into performance issues pretty quickly with this approach.

Matthew Christensen
I'm aware of the injection attacks, and the performance issues. They will be fixed later though (less of a problem for me than the regex:))
Hannson
A: 

This is the working code in case someone needs it:

from django import template
from django.template.defaultfilters import stringfilter
from sites.wiki.models import Page
import re

register = template.Library()

@register.filter
@stringfilter
def wikilink(value):
  WIKILINK_RE = re.compile(r'\[\[ ?(.*?) ?\| ?(.*?) ?\]\]')

  def wikilink_sub_callback(match_obj):
    alias = match_obj.group(1).strip()
    text = match_obj.group(2).strip()

    class_attr = ''
    try:
        Page.objects.get(alias=alias)
    except Page.DoesNotExist:
        class_attr = ' class="redlink"'
    return '<a href="%s"%s>%s</a>' % (alias, class_attr, text)

  return WIKILINK_RE.sub(wikilink_sub_callback, value)
wikilink.is_safe = True

Many thanks for all the answers!

Hannson