tags:

views:

512

answers:

3

Hi

I have a CMS with a WYSIWYG editor which produces pretty good xhtml. Based on this fact, I think a HTML parser might be slightly overkill for this small job.

I am intending to use regular expressions but so far have been unable to get mine to match what I'm after.

I'm using PHP5.

I need to match the content of the 3 block level elements the WYSIWYG editor is able to produce: p, ul & ol. I am using preg_match_all() currently.

Is anyone able to help me?

Thank you

A: 

I think I just figured it out

preg_match_all('/<(p|ul|ol)>(.*)<\/(p|ul|ol)>/iU', $content, $blockElements);
alex
Yeah, that /U flag for preg_ functions is really neet!
PEZ
+2  A: 

This should work as long as you don't have nested p/ul/ol tags:

preg_match_all("<(?:p|ul|ol)>(.*?)</(?:p|ul|ol)>", $string, $matches)

?: prevents anything in the parens from being included in $matches and .*? prevents the regex from matching past the end of another tag.

wulong
Ah I see... and not including the parenthesis makes the matching cleaner I assume. I'll implement! Thank you muchly
alex
I think you'll need to put / at the beginning and end of your regex, and escape the one in the closing tag (</)
alex
Put # in the beginning and of the regex and you don't have to escape the /.
PEZ
I used to use the @ symbol to begin and end, but I've gone with the / now to be 'traditional'
alex
A: 

This will find the topmost of each tag as long as you dont nest p tags in p tags or ul in ul. But you can nest p in ul for example. For complex html you are better of with DOM.

Example data:

$html = <<< EOF
<p>
 hey
</p>

<ul>
 <li>
  test 
 </li>
 <li>
  <p>
   df4r4 4f4
  </p>
 </li>
</ul>

<p>
 hoo
</p>

EOF;

Regex:

$regex = '#<(?P<tags>(?i)p|ul|ol)>(?P<values>.*?)</\1>#si';
preg_match_all($regex, $html, $output);

Sort by tags:

for ($i = 0, $t = count($output['tags']); $i < $t; $i++) {
    $out[$output['tags'][$i]][] = $output['values'][$i];
}

Tags and values separately, remove the duplicates with integer key and the whole line match:

$output = array_intersect_key($output, array('tags' => 0, 'values' => 0));
OIS