ansaurus

Question

Regex - find all links in a tweet

Answer 1

+1 A:

Try this one:

/\bhttps?:\/\/\S+\b/

Update:

To catch links beginning with "www." too (no "http://" prefix), you could try this:

/\b(?:https?:\/\/|www\.)\S+\b/

Asaph 2009-09-13 00:44:11

I think you can post links in tweets without the http(s). So this will fail something like "I really like www.this-site.com."

Andrei Vajna II 2009-09-13 01:02:38

Hmm. Interesting. Good comment. I updated my answer to detect links starting with "www." too.

Asaph 2009-09-13 01:12:11

Ok, now how about "Wow, stackoverflow.com is great!"? :P

Andrei Vajna II 2009-09-13 01:31:14

Yea to Andrei's comment: If you're going to go so far as to worry about starting without http:// you should just check for the non-space characters before all TLDs

Nerdling 2009-09-13 01:41:18

Answer 2

+1 A:

Here's a code snippet from a site I wrote that parses a twitter feed. It parses links, hash tags, and twitter usernames. So far it's worked fine. I know it's not Ruby, but the regex should be helpful.

if(tweetStream[i] != null)
                    {
                        var str = tweetStream[i].Text;
                        var re = new Regex(@"http(s)?:\/\/\S+");
                        MatchCollection mc = re.Matches(tweetStream[i].Text);

                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='" + m.Value + "' target='_blank'>" + m.Value + "</a>");
                        }
                        re = new Regex(@"(@)(\w+)");
                        mc = re.Matches(tweetStream[i].Text);
                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='http://twitter.com/" + m.Value.Replace("@",string.Empty) + "' target='_blank'>" + m.Value + "</a>");
                        }
                        re = new Regex(@"(#)(\w+)");
                        mc = re.Matches(tweetStream[i].Text);
                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='http://twitter.com/#search?q=" + m.Value.Replace("#", "%23") + "' target='_blank'>" + m.Value + "</a>");
                        }
                        tweets += string1 + "<div>" + str + "</div>" + string2;
                    }

Chuck 2009-09-13 01:26:26

Answer 3

+1 A:

Found this one here

^(?#Protocol)(?:(?:ht|f)tp(?:s?)\:\/\/|~/|/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(?:(?:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(?:(?:(?:/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|/)+|\?|#)?(?#Query)(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?$

Soldier.moth 2009-09-13 02:18:14

+1 for making me smile. :D

Andrei Vajna II 2009-09-13 02:27:37

ansaurus

tags:

views:

answers:

Regex - find all links in a tweet

related questions