views:

553

answers:

2

i have an XML file with large number of tags. it has more than 2000 tag here and there. i want to delete all that htmlText Tag and save it as a new xml. how cani do that in PHP??? here is the code tht iam using

$remove = $doc->getElementsByTagName('htmlText');
$doc->removeChild($remove);
+1  A: 

DOMDocument::getElementsByTagName() returns a DOMNodeList, that's a collection of DOMNodes. As DOMNode::removeChild() just accepts plain DOMNodes as its sole argument you'll have to iterate over the collection:

foreach ($doc->getElementsByTagName('htmlText') as $el) {
    $el->parentNode->removeChild($el);
}
Stefan Gehrig
There was a bug in the code... You can only remove children from the respective parents node's context, but $doc is not the parent node of the found <htmlText> nodes. You have to resort to the parentNode of each found node.
Stefan Gehrig
i want to dlete all htmlText tag what ever its parent is. only thing i know is tht the tagname htmlText
Jasim
The corrected code above does exactly this.
Stefan Gehrig
no it only deletes some of them not all...
Jasim
You should be more specific on which nodes are deleted and which aren't. Perhaps you can edit your question and add some more information, e.g. you should try to check if getElementsByTagName() finds all tags you think it should find to isolate the issue to one with the removeChild() part.
Stefan Gehrig
A: 

These two solutions should work:

$elements = $doc->getElementsByTagName('htmlText');
while ($elements->length > 0) {
  $elements->item(0)->parentNode->removeChild($elements->item(0));
}

or loop backwards

$elements = $doc->getElementsByTagName('htmlText');
for ($i = $elements->length-1; $i >= 0; $i--) {
    $elements->item($i)->parentNode->removeChild($elements->item($i));
}

Using foreach as suggested earlier, or looping from 0 up, won't work because the node list gets altered as you loop. You can test this with the following snippet:

$doc = new DOMDocument();
$doc->loadHTML('<p>first</p><p>second</p><p>third</p>');
foreach ($doc->getElementsByTagName('p') as $el) {
    $el->parentNode->removeChild($el);
}
echo $doc->saveHTML();

Here the node list contains 3 elements: 0=>first, 1=>second, 2=>third. If you run it you'll see the second element is not removed because the first iteration removes the element at index 0 ('first'), leaving the node list with only 2 elements (0=>second, 1=>third). The next iteration removes the element at index 1 (third) and the loop ends. If you then save the document you'll find the second element remains untouched. Which is probably what you experienced when you said "it only deletes some of them" to the previous suggestion.

Keyvan