views:

502

answers:

3

I'm putting some page content (which has been run through Tidy, but doesn't need to be if this is a source of problems) into DOMDocument using DOMDocument::loadHTML. It's coming up with various errors 'ID x already defined in Entity, line X'. Is there any way to make either DOMDocument (or Tidy) ignore or strip out duplicate element IDs, so it will actually create the DOMDocument?

Thanks. :)

+1  A: 

A quick search on the subject reveals this (incorrect) bug report:

http://bugs.php.net/bug.php?id=46136

The last reply states the following:

You're using HTML 4 rules to load an XHTML document. Either use the load() method to parse as XML or the libxml_use_internal_errors() function to ignore the warnings.

I can't be sure if you are encountering this problem for the same reasons, since you did not include a reference to the HTML page being loaded. In any case, using libxml_use_internal_errors() should at least suppress the error.

ID's in HTML documents are generally unique, so the best solution would still be validating your document, if at all possible.

Aron Rotteveel
A: 

By definition, IDs are unique. If they are not, you should use classes instead (nor names, where it applies).
I doubt you can force XML tools to ignore duplicate IDs, that will make them handle an invalid XML document.

PhiLho
A: 

Use Exceptions to treat duplicate IDs, and rename the second id. Or maybe, combine elements in sub-elements of same parent with the ID.

IDs are unique in an XML file (in the rootElement of XMLTree)