tags:

views:

64

answers:

2

Hi, I'm new to Regular Expressions.

I need to find just website in some text and I'm looking for a regular expression able to find out strings like:

www.my.home, http://my.site.it

But this regular expression should not find strings like:

[email protected] or if the website is already inside html tag

<a href="http://www.my.site.com/"&gt;
  <span style="font-style: normal;">www.mambo-test.org</span>
</a>

I tried with this one:

\b((https?://[^ ])|(www.[^ ]))

but it also finds the website in the href and between the tag:

<a href="http://www.my.site.com/"&gt;
  <span style="font-style: normal;">www.mambo-test.org</span>
</a>

and I don't know how except this case.

+1  A: 

Maybe this solves your problem.

npinti
That's great but I need to find also strings without the protocol like this one: www.my.site.org
Katie
I have just test the regular expression. For this input: jodfhsdfhttp://www.my.site.org/fishdfsuidhf I got this output: www.my.site.org . What protocol are you talking about exactly?
npinti
+2  A: 

What you're trying to do is called parsing HTML code via regular expressions.

First of all, I can feel your pain.

Second, here is explained in detail why you shouldn't do this.

Third, if your customers are inserting web links in a rich text editor and they sometimes do it properly and sometimes they don't, well... that's definitely a bad practice and such people should be educated. If they are too lazy to click on the "link" button of a rich text editor, their text will be treated as simple text and not as a link. They will soon understand.

Forth, which rich text editor are you using? TinyMCE offers a whole set of features and plugins that allow you to pre/post process the text inserted by the user easily. That might be easier than trying to edit that text in PHP.

Fifth, if you still need to do this, you might want to have a look to this tutorial on how to parse HTML to find links.

Roberto Aloi