views:

81

answers:

1

I am trying to figure out a way to match 3 different URL types, they are;

  1. http://(www.)domain.com or www.domain.com
  2. http://(www.)domain.com/image.jpg/.png/gif or www.domain.com/image.jpg/.png/.gif
  3. http://(www.)youtube.com/watch?v=Li1zXaEYol8 or www.youtube.com/watch?v=Li1zXaEYol8

Note, I do not want them parsed, if they are just domain.com without http or www.

The problem I'm facing is getting all three - or even 2 to work together, I have a class that does auto linking, so if there is something with http:// or www. it will link it, but then if I put an image in, it parses the HTML for the URL in the image like:

<img src="<a href="www.domain.com">domain.com</a>" />

which is rubbish :(

I would also like to scan for YouTube URLs, and then replace the URL with embed code, so that the video shows instead of the URL. I got as far as extracting the ID out the URL, but couldn't get the replacement working.

Note that this is working off a textarea, like a comment field, where people are inputing comments, and thus why it would be good to auto link URLs, parse HTML for images and YouTube videos.

The data is shown via SQL query and echoing out $comments['message'], with some str_replace stuff in place to do some simple formatting.

Any help would be appreciated.

+1  A: 

Here you go:

(?:http:\/\/www\.|http:\/\/|www\.)(?:youtube\.com\/watch\?v=(?:\w+)|domain\.com\/image.(?:jpg|png|gif)|domain\.com)

Or with delimiters:

~(?:http:\/\/www\.|http:\/\/|www\.)(?:youtube\.com\/watch\?v=(?:\w+)|domain\.com\/image.(?:jpg|png|gif)|domain\.com)~i

You can test the above RegExs @ Robular.

To find out if the URLs are inside unwanted HTML tags I suggest you use a DOM parser.

Also, check out this related question: How to mimic StackOverflow Auto-Link Behavior.


Regarding the YouTube replacement you can do something like this:

echo preg_replace('~youtube\.com\/watch\?v=(\w+)~i', 'embed code $1', $comments['message']);
Alix Axel
This is quite straight-forward. Anyways +1 for compiling it correctly :-P
Boldewyn