tags:

views:

113

answers:

3
+2  Q: 

Regexp for html

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

I have the following string:

$str = " 
<li>r</li>  
<li>a</li>  
<li>n</li>  
<li>d</li>  
...
<li>om</li>  
";

How do I get the HTML for the first n-th <li> tags?

Ex : n = 3 ; result = "<li>r<...>n</li>;

I would like a regexp if possible.

+10  A: 

Like this.

$dom = new DOMDocument();
@$dom->loadHTML($str);
$x = new DOMXPath($dom); 

// we wan the 4th node.
foreach($x->query("//li[4]") as $node) 
{
  echo $node->c14n()
}

Oh yeah, learn xpath, it will save you lots of trouble in the future.

Byron Whitlock
I would always recommend SimpleXML over DOMDocument dor so simple things as DOMDocument needs an overload of additional objects (like for xpath) and makes selecting of elements and their content way to complicated.
Kau-Boy
@Kau-Boy - interesting, care to post an example? Also do realize DOMDocument doesn't require you run the html though tidy. That in itself is a major win for me.
Byron Whitlock
A: 

As I'm sure you are aware it is not a good idea to use regular expressions to work through HTML unless you were to "tidy" it first.

A very viable solution in PHP would be to navigate the HTML structure using Simple XML (http://php.net/manual/en/book.simplexml.php) or as a DOM Document (http://php.net/manual/en/class.domdocument.php).

Joshua Burns
+4  A: 

The Solution of @Byron but with SimpleXML:

$xml = simplexml_load_string($str);

foreach($xml->xpath("//li[4]") as $node){
  echo $node[0]; // The first element is the text node
}

EDIT: Another reason I really like at simplexml is the easy debugging of the content of a node. You can just use print_r($xml) to print the object with it's child nodes.

Kau-Boy
Error on line three!!
hopeseekr
@kau-boy, Thanks +1
Byron Whitlock
Corrected, thanks for the hint!
Kau-Boy