ansaurus

Question

XML : how to remove all nodes which have no attributes nor child elements

Answer 1

+1 A:

Smething like this should do it:

XmlNodeList nodes = xmlDocument.GetElementsByTagName("Node1");

foreach(XmlNode node in nodes)
{
    if(node.ChildNodes.Count == 0)
         node.RemoveAll;
    else
    {
        foreach (XmlNode n in node)
        {
            if(n.InnerText==String.Empty && n.Attributes.Count == 0)
            {
                n.RemoveAll;

            }
        }
    }
}

TheGeekYouNeed 2010-04-01 07:36:18

The node names I mentioned are just to explain what I want. They are not the real node names. I want to do something generic. I believe XPath will be useful here, but i dont know how to use XPath. I am reading about it :). Thanks for the reply though.

mishal153 2010-04-01 07:42:01

Answer 2

+2 A:

Using an XPath expression it is possible to find all nodes that have no attributes or children. These can then be removed from the xml. As Sani points out, you might have to do this recursively because node_1_1 becomes empty if you remove its inner node.

var xmlDocument = new XmlDocument();
xmlDocument.LoadXml(
@"<Node1 attrib1=""abc"">
        <node1_1>
             <node1_1_1 />
        </node1_1>
    </Node1>
    ");

// select all nodes without attributes and without children
var nodes = xmlDocument.SelectNodes("//*[count(@*) = 0 and count(child::*) = 0]");

Console.WriteLine("Found {0} empty nodes", nodes.Count);

// now remove matched nodes from their parent
foreach(XmlNode node in nodes)
    node.ParentNode.RemoveChild(node);

Console.WriteLine(xmlDocument.OuterXml);
Console.ReadLine();

Thomas 2010-04-01 07:45:12

Thanks, this is working fine for me :)

mishal153 2010-04-01 07:53:15

Just want to add one more thing. I realize that I also need to cover the situation where a node is like <node1> hello </node1>. Here the node has no child and no attributes but it has text, and so i do not want it to be filtered and removed. So the correct solution for me was : XmlNodeList list = document.SelectNodes("//*[count(@*) = 0 and count(child::*) = 0 and not(text())]");

mishal153 2010-04-01 10:09:46

You could simplify that XPATH expression by using `node()` to combine the tests for `*` and `text()` and using a union `|` to merge tests for attributes and nodes for criteria of the count: `//*[count(child::node() | @*) = 0]`

Mads Hansen 2010-04-01 13:33:35

Answer 3

+1 A:

This should work if parent node should also be removed when all child nodes are removed:

static void Main(string[] args)
{
  const string strXml = "<Node1 attrib1=\"abc\"><node1_1><node1_1_1 /></node1_1></Node1>";
  var doc = new XmlDocument();
  doc.LoadXml(strXml);

  RemoveEmptyNodes(doc);
}

public static bool RemoveEmptyNodes(XmlNode node)
{
  if (node.HasChildNodes) {
    foreach (XmlNode child in node.ChildNodes) {
      var doDelete = RemoveEmptyNodes(child);
      if (doDelete)
      node.RemoveChild(child);
    }
  }

  if (node.HasChildNodes || node.Attributes.Count > 0) return false;

  return true;
}

Sani Huttunen 2010-04-01 07:49:29

Thanks Sani, I have checked and this also works fine. Thanks.

mishal153 2010-04-01 08:05:15

Answer 4

A:

This stylesheet uses an identity transform with an empty template matching elements without nodes or attributes, which will prevent them from being copied to the output:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <!--Identity transform copies all items by default -->
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <!--Empty template to match on elements without attributes or child nodes to prevent it from being copied to output -->
    <xsl:template match="*[not(child::node() | @*)]"/>

</xsl:stylesheet>

Mads Hansen 2010-04-01 13:40:00

ansaurus

tags:

views:

answers:

XML : how to remove all nodes which have no attributes nor child elements

related questions