ansaurus

Question

PHP HTML DomDocument getElementById problems

Answer 1

+2 A:

Well, you should check if $dom->loadHTML($html); returns true (success) and I would try

 var_dump($belement->nodeValue);

for output to get a clue what might be wrong.

EDIT: http://www.php-editors.com/php_manual/function.domdocument-get-element-by-id.html - it seems that DomDocument uses XPath internally.

Example:

$xpath = xpath_new_context($dom);
var_dump(xpath_eval_expression($xpath, "//*[@ID = 'YOURIDGOESHERE']"));

MartyIX 2010-08-02 21:48:25

Original post modified to reflect these outputs. Thanks,

Xepoch 2010-08-02 21:50:48

Answer 2

+3 A:

The Manual explains why:

For this function to work, you will need either to set some ID attributes with DOMElement->setIdAttribute() or a DTD which defines an attribute to be of type ID. In the later case, you will need to validate your document with DOMDocument->validate() or DOMDocument->validateOnParse before using this function.

By all means, go for valid HTML & provide a DTD.

Quick fixes:

Call $dom->validate(); and put up with the errors (or fix them), afterwards you can use $dom->getElementById(), regardless of the errors for some reason.
Use XPath if you don't feel like validing: $x = new DOMXPath($dom); $el = $x->query("//*[@id='bid']")->item(0);
Come to think of it: if you just set validateOnParse to true before loading the HTML, if would also work ;P

.

$dom = new DOMDocument();
$html ='<html>
<body>Hello <b id="bid">World</b>.</body>
</html>';
$dom->validateOnParse = true; //<!-- this first
$dom->loadHTML($html);        //'cause 'load' == 'parse

$dom->preserveWhiteSpace = false;

$belement = $dom->getElementById("bid");
echo $belement->nodeValue;

Outputs 'World' here.

Wrikken 2010-08-02 21:49:18

I do have validateOnParse. setIdAttribute only would apply to set and then subsequent retrieve? Again though, the HTML will be web-provided so I'm at their mercy, but just trying an example. HTML5 doesn't even have a DTD, yes?

Xepoch 2010-08-02 21:54:36

"setIdAttribute only would apply to set and then subsequent retrieve?" -> Yes. HTML5 is not finished yet so it should not have a DTD yet.

MartyIX 2010-08-02 21:59:37

DTD would be `<!DOCTYPE HTML>`, but just calling `$dom->validate()` would also work. Put up with the errors or try to generate valid HTML (the latter is more difficult than it seems... :) )

Wrikken 2010-08-02 21:59:58

@Xepoch I've never managed to get `getElementById` working when using `DOM` with HTML. But you can substitute `getElementById` with an XPath like `//p[@id="foo"]`

Gordon 2010-08-02 22:00:20

@Wrikken doesnt work for me. I'm getting *Trying to get property of non-object* on the `echo` call with PHP 5.3.2 on Vista and libxml 20703

Gordon 2010-08-02 22:12:26

Hmm, here it does work, PHP 5.3.2, libxml 2.7.6 (I assume for Windows, 20703 would be 2.7.3), you could try ftp://ftp.zlatkovic.com/libxml/libxml2-2.7.6.win32.zip . Calling `validate()` manually later on also no results?

Wrikken 2010-08-02 22:21:53

... and if that doesn't work, have you tried using the example from http://www.php.net/manual/en/domimplementation.createdocument.php ?

Wrikken 2010-08-02 22:25:14

@Wrikken Doing `validate()` only gets me a couple of errors about the `html40/loose.dtd` and the same error as before. Using the explicit DTD declaration doesnt help either. Ive tried on an XP machine with 5.3.0 and libxml 20626 and nothing as well. I guess this is either a Windows thing or a libxml thing. I'll try to update it. Upvoted nonetheless though.

Gordon 2010-08-03 07:30:34

@Gordon: OK, duly noted that this isn't cross-os/version behavior. Thankfully if works on my servers :) The XPath stays a failsafe fallback afaik.

Wrikken 2010-08-03 08:59:10

@Wrikken after upgrading PHP to 5.3.3 which comes bundled with libxml 2.7.7, getElementById is working.

Gordon 2010-08-04 12:25:01

OK, good news, nice to know live just got that little bit easier :)

Wrikken 2010-08-04 13:22:43

ansaurus

tags:

views:

answers:

PHP HTML DomDocument getElementById problems

related questions