views:

17

answers:

1

I would like to do the following, preferably with PHP:

Remove an entire word if a part of the word contains a specific string. This should be case insensitive and work multiple times, e.g. on a large text.

Pseudo-code: match = "www." lots_of_random_text = "... hello and welcome to www.stackoverflow.com! blah blah" result = magic_function(lots_of_random_text, "www.")

result should now equal to: "... hello and welcome to blah blah".

How would I do this the most efficient way?

+1  A: 

It seems that a regular expression would suit this task. Check out the docs for preg_match to start with, or the main PCRE docs for a complete overview.

php> $text="hello and welcome to www.stackoverflow.com snout pickle and while you're here, check out a unicorn at www.unicornmagicfairywonderland.net!";
php> $cleaned_text=preg_replace('#www\.[\w\d]+\.(com|net|org)#','',$text);
php> echo $cleaned_text;    
hello and welcome to  snout pickle and while you're here, check out a unicorn at !

The key part is the '#www.[\w\d]+.(com|net|org)#'. That means match any string that starts with www.,has any number of word characters or digits, and ends with .com, .net or .org.

If you're trying to replace any URL, the expression is going to be much more complex than this, so be warned this is incomplete. You'd want to make sure it matches words that start with http://, have no www. or have a different subdomain, and end with other domains like .co.uk or .edu, right?

Regular expressions are in general, complex and tough to get right. You may find www.regular-expressions.info helpful.

Alex JL