tags:

views:

569

answers:

7

I have an XML file say

  <items>
      <item1>
        <piece>1300</piece>
        <itemc>665583</itemc> 
      </item1>
      <item2>
        <piece>100</piece>
        <itemc>665584</itemc>
      </item2>
    </items>

I am trying to write a c# application to get all the x-path to inner most nodes eg :

items/item1/piece
items/item1/itemc
items/item2/piece
items/item2/itemc

Is there a way to do it using C# or VB?Thank you in advance for a probable solution.

+2  A: 

It's untested and prob needs some work done to it just to get a compile but do you want something like this?

class Program
{
    static void Main()
    {
        XmlDocument xml = new XmlDocument();
        xml.Load("test.xml");

        var toReturn = new List<string>();
        GetPaths(string.Empty, xml.ChildNodes[0], toReturn);
    }

    public static void GetPaths(string pathSoFar, XmlNode node, List<string> results)
    {
        string scopedPath = pathSoFar + node.Name + "/";

        if (node.HasChildNodes)
        {
            foreach (XmlNode itemNode in node.ChildNodes)
            {
                GetPaths(scopedPath, itemNode, results);
            }
        }
        else
        {
            results.Add(scopedPath);
        }
    }
}

For large chunks of xml though it might not be very memory efficient.

runrunraygun
+4  A: 

There you go:

static void Main()
{
   XmlDocument doc = new XmlDocument();
   doc.Load(@"C:\Test.xml");

   foreach (XmlNode node in doc.DocumentElement.ChildNodes)
   {
        ProcesNode(node, doc.DocumentElement.Name);
   }
}


    private void ProcesNode(XmlNode node, string parentPath)
    {
        if (!node.HasChildNodes
            || ((node.ChildNodes.Count == 1) && (node.FirstChild is System.Xml.XmlText)))
        {
            System.Diagnostics.Debug.WriteLine(parentPath + "/" + node.Name);
        }
        else
        {
            foreach (XmlNode child in node.ChildNodes)
            {
                ProcesNode(child, parentPath + "/" + node.Name);
            }
        }
    }

The above code will generate the desired output for any type of file. Please add checks whereever required. The main part is that we ignore the Text node (Text inside the node) from output.

A9S6
+1  A: 

Maybe not the fastest solution, but it shows allows for arbitrary XPath expressions to be used as selector and to me it seems this also most clearly expresses the intention of the code.

class Program
{
    static void Main(string[] args)
    {
        XmlDocument xml = new XmlDocument();
        xml.Load("test.xml");

        IEnumerable innerItems = (IEnumerable)e.XPathEvaluate("//*[not(*)]");
        foreach (XElement innerItem in innerItems)
        {
            Console.WriteLine(GetPath(innerItem));
        }
    }

    public static string GetPath(XElement e)
    {
        if (e.Parent == null)
        {
            return "/" + e.Name;
        }
        else
        {
            return GetPath(e.Parent) + "/" + e.Name;
        }
    }
}
Jeroen Huinink
This code is xml specific. I think OP would require a solution that is generic across the given Xmls
A9S6
You are correct. I added the xpath suggested by Helios, which makes this generic. Although my original comment that this expresses the intentions clearly might no longer apply. You have to understand the XPath to note that this finds all innermost nodes.
Jeroen Huinink
+11  A: 
//*[not(*)]

is the XPath to find all subelements with no children, so you can do something like

doc.SelectNodes("//*[not(*)]")

but I'm not pretty sure about .Net API so check it out.

Reference

// --> descendant (not only children)
*  --> any name
[] --> predicate to evaluate
not(*) --> not having children
helios
+1: Great answer
A9S6
Thank you very much. I was convinced XPath had something for that, but I couldn't find the answer in w3schools so Google came in the rescue.
helios
Thunder: when you obtain the nodes you can ask for the path of each one.
helios
+2  A: 

Just to expand slightly on helios's answer, you could quality your xpath with [text()] to specific only those nodes that have a text() node:

// XDocument
foreach(XElement textNode in xdoc.XPathSelectElements("//*[not(*)][text()]"))
{
    Console.WriteLine(textNode.Value);
}

// XmlDocument
foreach(XmlText textNode in doc.SelectNodes("//*[not(*)]/text()"))
{
    Console.WriteLine(textNode.Value);
}
Kevin Nixon
+3  A: 

Here is an XSLT solution that produces the XPATH expressions for each of the inner-most elements.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="/">
        <xsl:apply-templates />
    </xsl:template>

    <!--Match on all elements that do not contain child elements -->
    <xsl:template match="//*[not(*)]">
        <!--look up the node tree and write out:
           - a slash
           - the name of the element
           - and a predicate filter for the position of the element at each step -->
        <xsl:for-each select="ancestor-or-self::*">
            <xsl:text>/</xsl:text>
            <xsl:value-of select="local-name()"/>
            <!--add a predicate filter to specify the position, in case there are more than one element with that name at that step -->
            <xsl:text>[</xsl:text>
            <xsl:value-of select="count(preceding-sibling::*[name()=name(current())])+1" />
            <xsl:text>]</xsl:text>
        </xsl:for-each>  
        <!--Create a new line after ever element -->
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>

<!--override default template to prevent extra whitespace and carriage return from being copied into the output-->
<xsl:template match="text()" />

</xsl:stylesheet>

I added predicate filters to specify the position of the element. That way, if you had more than one piece or itemc element at the same level, the XPATH would specify the correct one.

So, instead of:

items/item1/piece
items/item1/itemc
items/item2/piece
items/item2/itemc

it produces:

/items[1]/item1[1]/piece[1]
/items[1]/item1[1]/itemc[1]
/items[1]/item2[1]/piece[1]
/items[1]/item2[1]/itemc[1]
Mads Hansen
+1 for the predicates.
Robert Rossney
+2  A: 

The code below finds all leaf elements in the document and, for each, outputs an XPath expression that will unambiguously navigate to the element from the document root, including a predicate at each node step to disambiguate between elements with the same name:

static void Main(string[] arguments)
{
    XDocument d = XDocument.Load("xmlfile1.xml");

    foreach (XElement e in d.XPathSelectElements("//*[not(*)]"))
    {
        Console.WriteLine("/" + string.Join("/",
            e.XPathSelectElements("ancestor-or-self::*")
                .Select(x => x.Name.LocalName 
                    + "[" 
                    + (x.ElementsBeforeSelf()
                        .Where(y => y.Name.LocalName == x.Name.LocalName)
                        .Count() + 1)
                    + "]")
                .ToArray()));            
    }

    Console.ReadKey();
}

For example, this input:

<foo>
  <bar>
    <fizz/>
    <baz>
      <bat/>
    </baz>
    <fizz/>
  </bar>
  <buzz></buzz>
</foo>

produces this output:

/foo[1]/bar[1]/fizz[1]
/foo[1]/bar[1]/baz[1]/bat[1]
/foo[1]/bar[1]/fizz[2]
/foo[1]/buzz[1]
Robert Rossney
It should be some way to find the index of the node without querying for each one how many previous items are there...
helios
It'd be nice, but I don't think that you can get there in a lambda expression.
Robert Rossney