tags:

views:

122

answers:

3

I want to create a good syntax for making it easier to link to internal things, kind of like a Wiki, so maybe I'm thinking the user can use something like:

Bah bah bah reference: [[a]]

And it would convert it into HTML:

Bah bah bah reference: <a href="..../id=a">title</a>

but I'm worry about interjection and security issues. What's a good syntax to pick (maybe [[]] or something else) and what's a safe regex for it?

+6  A: 

Don't reinvent the wheel: Text::Markdown.

Sinan Ünür
I'm aware of Markdown/etc. I'm actually thinking about an extra layer on top of that to make certain links easier, for example, linking to an internal site with a key, as opposed to parsing everything myself, or linking to a list of tags with a short cut.
Timmy
+3  A: 

While Sinan's idea of using an existing syntax, such as Markdown, is a good one, the viability of this sort of approach depends on the particulars of your situation. You haven't told us enough to get very helpful responses. Are you starting from a blank slate or are you inheriting a large body of existing documents that would require a costly conversion process, not to mention staff retraining?

If the latter, sometimes it's best to tackle the problem by thinking in terms of short-term workarounds and long-term strategies. The short-term workaround -- some sort of home-grown syntax precisely like that proposed in your question -- will be ugly and may cause hassles here and there, but it can get the job done. For example, our organization has a large pile of Word documents that ultimately provide content for HTML pages. Within those Word files, staff member use an approach like this to create links and make a few other declarations that our parsing code handles: ##some_link##. There are many ways for this to fail, but in the type of content that we are producing, it rarely occurs. Partly for that reason, it's difficult to generate much enthusiasm for a long-term strategy of migrating the content to a more robust system. My expectation is that such a migration will occur, but it is being driven by larger considerations and not by the limitations of our kludgy ##foo## markup device.

Update:

Based on your additional comments, it sounds like you will have a list of links that you want your users to be able to add quickly, using short IDs. So, those links will be defined someplace, say a Perl hash. You can use this hash to guard against invalid entries. For example:

use strict;
use warnings;

my %approved_links = (
    so_perl => 'http://stackoverflow.com/questions/tagged/perl',
    so_ruby => 'http://stackoverflow.com/questions/tagged/ruby',
);

my $regex = qr/ \[\[ ([^\]]+) \]\] /x;

while (<DATA>){
    die "Danger: $_\n" if map {
        exists $approved_links{$_} ? () : ($_)
    } /$regex/g;
    s/$regex/$approved_links{$1}/g; # Not complete, but you get the idea.
    print;
}

__DATA__
Some text [not an id] [[so_perl]] more [[so_ruby]] more..
[[so_perl]] blah [[so_ruby]] blah
[[bad id!!]]
FM
It's a mostly text summary section, and I have a tree-based parser that would get validate and strip the HTML (3rd party), but I want functionality to allow some internal links, for example the wikipedia's syntax with [[Article Name]] which gets automatically translate into the link. I want to do this while making sure people cannot put anything dangerous inside..
Timmy
+1  A: 

Something like Textile or Creole? Why don't you use an existing implementation?

Pascal Thivent