tags:

views:

108

answers:

2

Hi!

I use the following to get a html document into DOM:

$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTML($html)

and then I add some new content to an element in the html:

$element = $dom->getElementById('mybox');
$f = $dom->createDocumentFragment();
$f->appendXML('<div id="newbox">foo</div>');
$element->appendChild($f);

But if I now want to manipulate the #newbox, I can't do it because I can't access it with getElementById(). In order to do that I have to do the following (reloading with the new html):

$html = $dom->saveHTML();
$dom->loadHTML($html)

Which works fine, but when having to do this between every dom manipulation, it becomes expensive performance-wise.

Is there any better way to "refresh" the DOM so that it works with the newly added elements?

Thanks in advance! :)

+1  A: 

On the save-and-load approach, you could also try Document.normalizeDocument. This should fix up the document as if it had been save-cycled, without actually really serialising. One thing that should do would be to re-calculate the isID-ness of attributes from the document type, which you'd hope would have been set to one of the HTML doctypes (that define id as being an attribute of value type ID) by loadHTML.

(There is also Element.setIdAttribute which can be used to declare one instance of an Attr to contain an ID, but that's no use to you since you'd have to get hold of it first.)

I haven't tested this though and it wouldn't surprise me if PHP didn't implement this DOM Level 3 Core stuff properly. By my interpretation of the spec for isId, I reckon it should have picked up the id type definition already automatically. (My own DOM implementation certainly does.) But in that case your code would have worked. And I suppose appendXML is a non-standard method after all, so there's nothing to say it has to resolve type definitions like loadXML or loadHTML would.

So, maybe a workaround is a better plan. You might use a DOMXPath to select the element by @id attribute rather than real IDness as such. Of course this will be much slower than getElementById, but hopefully faster than normalizeDocument.

Or just lose the XML string-slinging and stick to the DOM methods, if you can; then it's trivial to keep a reference to a created element. (You can use helper functions to create the elements a bit more quickly if you find the DOM methods too wordy for the amount of content you're creating.)

bobince
Thanks for the reply bobince,seems like normalizeDocument didn't do the trick.I agree with you that sticking to DOM methods when adding new content would be best, but in this case it has to accept the strings, unless I build some recursive function to make proper DOM out of the string, but I guess it might end up using just as much resources as the reloading does now.
Tommy
Yeah, shame... still, the XPath `//div[@id='newbox']` workaround should work OK. Assuming it works with a DocumentFragment as the context node, which it's supposed to but, again, I haven't tested on PHP... ;-)
bobince
A: 

The only thing I know of that can handle that very well.. beautifuly is python's beautiful soup. The DOM is all split up into a parse tree which you can add to or take away at will mabey you can write a python script to handle the html and then coordinate the scripts by database or system call. alternatively server side javascript might be worth investigating.

myk_raniu