views:

283

answers:

2
$test = "<div><b><i>#uniquetag#</b></i></div> <div>Keep this</div>";

$test = preg_replace("/<div(.*)#uniquetag#(.*)<\/div>/i", "#uniquetag#", $test);

I want the result to be

$test = "#uniquetag# <div>Keep this</div>";

But it returns

$test = "#uniquetag#";

I think I know why. (.*) is greedy and it extend the search till the end. But I can't figure out what is the correct way to do it.

Update:

Special thanks to ghostdog74. Old problem solved. A new problem is experienced....

$test = "<div></div> <div><b><i>#uniquetag#</b></i></div> <div>Keep this</div>";

$test = preg_replace("/<div(.*)#uniquetag#(.*?)<\/div>/i", "#uniquetag#", $test);

Expected result is

$test = "<div></div> #uniquetag# <div>Keep this</div>";

But it turns out to be

$test = "#uniquetag# <div>Keep this</div>";

Again, I believe that's because of the first (.). Changing it to (.?) won't help also. Need to think of a way to exclude .

+3  A: 

In the majority of cases I'd strongly recommend using an HTML parser (such as this one) to get these links. Using regular expressions to parse HTML is going to be problematic since HTML isn't regular and you'll have no end of edge cases to consider.

See here for more info.

Brian Agnew
HTML parsers are not always the best way when working with HTML code. If the code he is working on is always formatted exactly like this (for example because he generates it somewhere before), then RegExp is a fine and quick way for simple changes.
poke
I think that's a fair comment, and pragmatism is a great guiding principle. I've amended to reflect that proviso.
Brian Agnew
+3  A: 

change (.*) to (.*?)

ghostdog74
It works, but I face another problem. If there is a div tag before this code, it will gobble up all the divs before it.
John Adawan
show your example in your question and your expected output
ghostdog74