tags:

views:

28

answers:

1

What is my best option for converting plain text links within a string into anchor tags?

Say for example I have "I went and searched on http://www.google.com/ today". I would want to change that to "I went and searched on http://www.google.com/ today".

The method will need to be safe from any kind of XSS attack also since the strings are user generated. They will be safe before parsing so I just need to make sure that no vulnerabilities are introduced through parsing the URLs.

+1  A: 

A simple regular expression could get you what you want, since you say that the strings will be safe before parsing. Just use the following method.

private static readonly Regex urlRegex = new Regex(@"(?<Protocol>\w+):\/\/(?<Domain>[\w@][\w.:@]+)\/?[\w\.?=%&=\-@/$,]*", RegexOptions.Compiled);
private static readonly Regex emailRegex = new Regex(@"([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})", RegexOptions.Compiled);
private static readonly IEnumerable<string> disallowedProtocols = new[] { "javascript", "ftp" };
private static string ConvertUrls(string s) {
    s = emailRegex.Replace(
            s,
            match => string.Format(CultureInfo.InvariantCulture, "<a href=\"mailto:{0}\" rel=\"nofollow\">{0}</a>", match.Value)
        );

    s = urlRegex.Replace(
            s,
            match => {
                var protocolGroup = match.Groups["Protocol"];
                if (protocolGroup.Success && !disallowedProtocols.Contains(protocolGroup.Value, StringComparer.OrdinalIgnoreCase)) {
                    return string.Format(CultureInfo.InvariantCulture, "<a href=\"{0}\" rel=\"nofollow\">{0}</a>", match.Value);
                } else {
                    return match.Value;
                }
            }
        );

    return s;
}
mnero0429
`javascript:alert('XSS')`
SLaks
I changed my code a bit to disallow certain protocols like "ftp", but if the user just entered "javascript:alert('XSS')", my regular expression wouldn't pick it up, so you'd be safe from this.
mnero0429
It should be possible to write malicious Javascript that passes your regex (I'm too lazy to make an example), so you do need to disallow `javascript:`.
SLaks