views:

519

answers:

3

given the following string in PHP:

$html = "<div>
<p><span class='test1 test2 test3'>text 1</span></p>
<p><span class='test1 test2'>text 2</span></p>
<p><span class='test1'>text 3</span></p>
<p><span class='test1 test3 test2'>text 4</span></p>
</div>";

I just want to either empty or remove any class that has "test2" in it, so the result would be this:

<div>
<p><span class=''>text 1</span></p>
<p><span class=''>text 2</span></p>
<p><span class='test1'>text 3</span></p>
<p><span class=''>text 4</span></p>
</div>

of if you're removing the element:

<div>
<p>text 1</p>
<p>text 2</p>
<p><span class='test1'>text 3</span></p>
<p>text 4</p>
</div>

I'm happy to use a regex expression or something like PHP Simple HTML DOM Parser, but I have no clue how to use it. And with regex, I know how to find the element, but not the specific attribute associated w/ it, especially if there are multiple attributes like my example above. Any ideas?

+1  A: 

You can use any DOM Parser, iterate over every element. Check whether its class attribute contains test2 class (strpos()) if so then set empty string as a value for class attribute.

You can also use regular expressions to do that - much shorter way. Just find and replace (preg_replace()) using the following expression: #class=".*?test2.*?"#is

Crozin
I tried: preg_replace('#class=".*?test2.*?"#is', "", $html); but that did not work; did I do it wrong?
James Nine
It should be `$html = ....`. But use solution proposed be Josh - it's better (I've forgotten that we can so easily search for interesting elements).
Crozin
yep, i did "$html =" in the beginning of course. i'll have a look at Josh's answer though.
James Nine
+4  A: 

using the PHP Simple HTML DOM Parser

Updated and tested! You can get the simple_html_dom.php include from the above link or here.

for both cases:

include('../simple_html_dom.php');

$html = str_get_html("<div><p><span class='test1 test2 test3'>text 1</span></p>
<p><span class='test1 test2'>text 2</span></p>
<p><span class='test1'>text 3</span></p>
<p><span class='test1 test3 test2'>text 4</span></p></div>");

case 1:

foreach($html->find('span[class*="test2"]') as $e)
$e->class = '';

echo $html;

case 2:

foreach($html->find('span[class*="test2"]') as $e)
$e->parent()->innertext = $e->plaintext;

echo $html;
Josh
case 1 throws: "Warning: Attempt to assign property of non-object"case 2 throws: Parse error: syntax error, unexpected '[', expecting ')'Am I doing something wrong? I started it with: $html = new simple_html_dom(); $html->load( ... the html string above ... );
James Nine
what version of php are you running?
Josh
PHP version 5.3.0
James Nine
sorry - i've updated and tested the code - it is now working. i think this method is much easier to read what is going on.
Josh
Use DomDocument
AntonioCS
having used jquery quite a lot i like the similar easy syntax the PHP Simple HTML DOM Parser uses. not sure though about the overhead it causes but for small/medium sites i think its really easy to use.
Josh
+2  A: 
$notest2 = preg_replace(
         "/class\s*=\s*'[^\']*test2[^\']*'/", 
         "class=''", 
         $src);

C.

symcbean
This actually works! Thanks!
James Nine
Don't use regex to parse html :( We have been over this a gazillion times!! Please look at http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
AntonioCS
Just use whatever solves your problem
kemp
@kemp: That's the wrong way to think when you are trying to do things. Do stuff the right way and probably you won't have any problems in the future, do them in which ever manner works and it will come back to bite you in the butt
AntonioCS
I just don't get this holy war: this **is not** parsing HTML, it's a simple text search and replace. The "right way" doesn't exist, it totally depends on the context.
kemp