tags:

views:

39

answers:

2

I found a link in a tweet that my current regex won't parse and I can't seem to figure out how to get it working (probably due to my ineptness with regex).

Here's the current code:

preg_match_all('@((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)@',$description, $matches, PREG_SET_ORDER);

And the Tweet that won't parse:

Amazon: 14-day lending coming to Kindle "later this year". http://usat.me?128426

It's the usat.me link that's screwing things up. Any thoughts?

+1  A: 

You can try it here, it's working at least for me:

http://www.spaweditor.com/scripts/regex/

You can try this RegEx:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
infinity
Didn't know about that site, thanks. Unfortunately it's still returning http://usat.me instead of the full URL.
Noah
Did you try the RegEx I provided? It's working fine
infinity
Thanks, but the test tool is kicking this back: "Unknown modifier '\'"
Noah
nice page infinity. it's nice if you want to check something the fast way. but you should realy try regex coach http://www.weitz.de/regex-coach/ it's just neat. provides you with a tree representation of your expression and a step by step execution which can be pretty interesting for testing strings. it also provides a replacement engine for testing replacement strings like used in modrewrite ;-)
ITroubs
A: 
((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?((/)?([-\w/_\.]*(\?\S+)?)?)*)

Try that. Should work. modified the / to be (/)? meaning the last slash is supposed to be there 0 or 1 times.

ITroubs
Looks great. Thanks so much.
Noah