tags:

views:

46

answers:

2

Hi,

what is the best way to get some html elements + values? example:

<div id="abc" class="classs">
    <img src="pic1.png" alt="pico">
    <img src="pic2.png" alt="nano">
</div>

what I have is the id=abc of the div element. I want to get everything inside the div element like:

class of the div ("classs")
src of the pictures and other data:
src="pic1.png", alt="pico"
src="pic2.png", alt="nano"

it should be in an array, object or something. What would you prefer? xpath? regex? xmlobject?

+1  A: 

You might want to use PHP Simple HTML DOM Parser

codaddict
excellent, works great! thx
qxxx
A: 

Use this function:

public function innerHTML($DOMnode) {
    return preg_replace(
        '/^<(\w+)\b.*?>(.*)<\/\1?>/s',
        '$2',
        $DOMnode->ownerDocument->saveXML($DOMnode)
    );
}
stillstanding
IA IA Cthulhu Fhtagn!!! http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
Gordon
if you studied the code better, you'd notice you're not parsing the entire HTML page, but only the contents of the DOM node!
stillstanding
I did study it and found it horrible to convert the DomNode to string in order to be able to run a Regex on it.
Gordon
I see no reason why using strings would be less efficient than iterating over nodes and using appendXML and document fragments
stillstanding
Because it's like switching from a scalpell to a spoon during surgery. If you are already using the right toolset (DOM), why abandon it when you are halfway through for one that has no clue about nodes and attributes?
Gordon