tags:

views:

763

answers:

4

Hello,

Does anyone have any good c# code that will parse a string and "linkify" any urls that may be in the string?

+14  A: 

It's a pretty simple task you can acheive it with Regex and a ready-to-go regular expression from:

Something like:

var html = Regex.Replace(html, @"^(http|https|ftp)\://[a-zA-Z0-9\-\.]+" +
                         "\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?" +
                         "([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$",
                         "<a href=\"$1\">$1</a>");

You may also be interested not only in creating links but in shortening URLs. Here is a good article on this subject:

See also:

Koistya Navin
Hi. Great response. Most of the suggestions in your post (and links) seem to work but they all seem to break any existing links in the text being evaluated.
Vance Smith
VSmith you can try different reg expressions from regixlib.com and find which one works best for you.
Koistya Navin
@VSmith: Are you implying that you have a string like "hello <a href="http://www.a.com">there</a>, see: http://www.b.com"; and you only want to linkify the second one?
Zhaph - Ben Duguid
hmm, that worked well. thus proving all the points we're making here ;)
Zhaph - Ben Duguid
Hi Zhaph, yes thats definitely what I want to do. stackoverflow seems to have great "linkifying" code doesnt it? ;-)
Vance Smith
+4  A: 

It's not that easy as you can read in this blog post by Jeff Atwood. It's especially hard to detect where an URL ends.

For example, is the trailing parenthesis part of the URL or not:

In the first case, the parentheses are part of the URL. In the second case they are not!

M4N
And as you can see from the linkified URLs in this answer, not everyone gets it right :)
Ray
Well in fact, I didn't want the two URLs to be linkified. But it seems this is not supported.
M4N
Jeff's regex seems to display badly in my browser, I believe it should be: "\(?\bhttp://[-A-Za-z0-9+
Zhaph - Ben Duguid
+1  A: 
protected string Linkify( string SearchText ) {
    // this will find links like:
    // http://www.mysite.com
    // as well as any links with other characters directly in front of it like:
    // href="http://www.mysite.com"
    // you can then use your own logic to determine which links to linkify
    Regex regx = new Regex( @"\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b", RegexOptions.IgnoreCase );
    SearchText = SearchText.Replace( "&nbsp;", " " );
    MatchCollection matches = regx.Matches( SearchText );

    foreach ( Match match in matches ) {
        if ( match.Value.StartsWith( "http" ) ) { // if it starts with anything else then dont linkify -- may already be linked!
            SearchText = SearchText.Replace( match.Value, "<a href='" + match.Value + "'>" + match.Value + "</a>" );
        }
    }

    return SearchText;
}
Vance Smith
Cheers for posting that one :)
Zhaph - Ben Duguid
+2  A: 

well, after a lot of research on this, and several attempts to fix times when

  1. people enter in http://www.sitename.com and www.sitename.com in the same post
  2. fixes to parenthisis like (http://www.sitename.com) and http://msdn.microsoft.com/en-us/library/aa752574(vs.85).aspx
  3. long urls like: http://www.amazon.com/gp/product/b000ads62g/ref=s9_simz_gw_s3_p74_t1?pf_rd_m=atvpdkikx0der&amp;pf_rd_s=center-2&amp;pf_rd_r=04eezfszazqzs8xfm9yd&amp;pf_rd_t=101&amp;pf_rd_p=470938631&amp;pf_rd_i=507846

we are now using this HtmlHelper extension... thought I would share and get any comments:

    private static Regex regExHttpLinks = new Regex(@"(?<=\()\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|](?=\))|(?<=(?<wrap>[=~|_#]))\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|](?=\k<wrap>)|\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]", RegexOptions.Compiled | RegexOptions.IgnoreCase);

    public static string Format(this HtmlHelper htmlHelper, string html)
    {
        if (string.IsNullOrEmpty(html))
        {
            return html;
        }

        html = htmlHelper.Encode(html);
        html = html.Replace(Environment.NewLine, "<br />");

        // replace periods on numeric values that appear to be valid domain names
        var periodReplacement = "[[[replace:period]]]";
        html = Regex.Replace(html, @"(?<=\d)\.(?=\d)", periodReplacement);

        // create links for matches
        var linkMatches = regExHttpLinks.Matches(html);
        for (int i = 0; i < linkMatches.Count; i++)
        {
            var temp = linkMatches[i].ToString();

            if (!temp.Contains("://"))
            {
                temp = "http://" + temp;
            }

            html = html.Replace(linkMatches[i].ToString(), String.Format("<a href=\"{0}\" title=\"{0}\">{1}</a>", temp.Replace(".", periodReplacement).ToLower(), linkMatches[i].ToString().Replace(".", periodReplacement)));
        }

        // Clear out period replacement
        html = html.Replace(periodReplacement, ".");

        return html;
    }
josefresno