Could someone supply some code that would get the xpath of a System.Xml.XmlNode instance?
Thanks!
Could someone supply some code that would get the xpath of a System.Xml.XmlNode instance?
Thanks!
There's no such thing as "the" xpath of a node. For any given node there may well be many xpath expressions which will match it.
You can probably work up the tree to build up an expression which will match it, taking into account the index of particular elements etc, but it's not going to be terribly nice code.
Why do you need this? There may be a better solution.
Okay, I couldn't resist having a go at it. It'll only work for attributes and elements, but hey... what can you expect in 15 minutes :) Likewise there may very well be a cleaner way of doing it.
It is superfluous to include the index on every element (particularly the root one!) but it's easier than trying to work out whether there's any ambiguity otherwise.
using System;
using System.Text;
using System.Xml;
class Test
{
static void Main()
{
string xml = @"
<root>
<foo />
<foo>
<bar attr='value'/>
<bar other='va' />
</foo>
<foo><bar /></foo>
</root>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
XmlNode node = doc.SelectSingleNode("//@attr");
Console.WriteLine(FindXPath(node));
Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node);
}
static string FindXPath(XmlNode node)
{
StringBuilder builder = new StringBuilder();
while (node != null)
{
switch (node.NodeType)
{
case XmlNodeType.Attribute:
builder.Insert(0, "/@" + node.Name);
node = ((XmlAttribute) node).OwnerElement;
break;
case XmlNodeType.Element:
int index = FindElementIndex((XmlElement) node);
builder.Insert(0, "/" + node.Name + "[" + index + "]");
node = node.ParentNode;
break;
case XmlNodeType.Document:
return builder.ToString();
default:
throw new ArgumentException("Only elements and attributes are supported");
}
}
throw new ArgumentException("Node was not in a document");
}
static int FindElementIndex(XmlElement element)
{
XmlNode parentNode = element.ParentNode;
if (parentNode is XmlDocument)
{
return 1;
}
XmlElement parent = (XmlElement) parentNode;
int index = 1;
foreach (XmlNode candidate in parent.ChildNodes)
{
if (candidate is XmlElement && candidate.Name == element.Name)
{
if (candidate == element)
{
return index;
}
index++;
}
}
throw new ArgumentException("Couldn't find element within parent");
}
}
Jon's correct that there are any number of XPath expressions that will yield the same node in an an instance document. The simplest way to build an expression that unambiguously yields a specific node is a chain of node tests that use the node position in the predicate, e.g.:
/node()[0]/node()[2]/node()[6]/node()[1]/node()[2]
Obviously, this expression isn't using element names, but then if all you're trying to do is locate a node within a document, you don't need its name. It also can't be used to find attributes (because attributes aren't nodes and don't have position; you can only find them by name), but it will find all other node types.
To build this expression, you need to write a method that returns a node's position in its parent's child nodes, because XmlNode
doesn't expose that as a property:
static int GetNodePosition(XmlNode child)
{
for (int i=0; i<child.ParentNode.ChildNodes.Count; i++)
{
if (child.ParentNode.ChildNodes[i] == child)
{
// tricksy XPath, not starting its positions at 0 like a normal language
return i + 1;
}
}
throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property.");
}
(There's probably a more elegant way to do that using LINQ, since XmlNodeList
implements IEnumerable
, but I'm going with what I know here.)
Then you can write a recursive method like this:
static string GetXPathToNode(XmlNode node)
{
if (node.NodeType == XmlNodeType.Attribute)
{
// attributes have an OwnerElement, not a ParentNode; also they have
// to be matched by name, not found by position
return String.Format(
"{0}/@{1}",
GetXPathToNode(((XmlAttribute)node).OwnerElement),
node.Name
);
}
if (node.ParentNode == null)
{
// the only node with no parent is the root node, which has no path
return "";
}
// the path to a node is the path to its parent, plus "/node()[n]", where
// n is its position among its siblings.
return String.Format(
"{0}/node()[{1}]",
GetXPathToNode(node.ParentNode),
GetNodePosition(node)
);
}
As you can see, I hacked in a way for it to find attributes as well.
Jon slipped in which his version while I was writing mine. There's something about his code that's going to make me rant a bit now, and I apologize in advance if it sounds like I'm ragging on Jon. (I'm not. I'm pretty sure that the list of things Jon has to learn from me is exceedingly short.) But I think the point I'm going to make is a pretty important one for anyone who works with XML to think about.
I suspect that Jon's solution emerged from something I see a lot of developers do: thinking of XML documents as trees of elements and attributes. I think this largely comes from developers whose primary use of XML is as a serialization format, because all the XML they're used to using is structured this way. You can spot these developers because they're using the terms "node" and "element" interchangeably. This leads them to come up with solutions that treat all other node types as special cases. (I was one of these guys myself for a very long time.)
This feels like it's a simplifying assumption while you're making it. But it's not. It makes problems harder and code more complex. It leads you to bypass the pieces of XML technology (like the node()
function in XPath) that are specifically designed to treat all node types generically.
There's a red flag in Jon's code that would make me query it in a code review even if I didn't know what the requirements are, and that's GetElementsByTagName
. Whenever I see that method in use, the question that leaps to mind is always "why does it have to be an element?" And the answer is very often "oh, does this code need to handle text nodes too?"
This is even easier
''' <summary>
''' Gets the full XPath of a single node.
''' </summary>
''' <param name="node"></param>
''' <returns></returns>
''' <remarks></remarks>
Private Function GetXPath(ByVal node As Xml.XmlNode) As String
Dim temp As String
Dim sibling As Xml.XmlNode
Dim previousSiblings As Integer = 1
'I dont want to know that it was a generic document
If node.Name = "#document" Then Return ""
'Prime it
sibling = node.PreviousSibling
'Perculate up getting the count of all of this node's sibling before it.
While sibling IsNot Nothing
'Only count if the sibling has the same name as this node
If sibling.Name = node.Name Then
previousSiblings += 1
End If
sibling = sibling.PreviousSibling
End While
'Mark this node's index, if it has one
' Also mark the index to 1 or the default if it does have a sibling just no previous.
temp = node.Name + IIf(previousSiblings > 0 OrElse node.NextSibling IsNot Nothing, "[" + previousSiblings.ToString() + "]", "").ToString()
If node.ParentNode IsNot Nothing Then
Return GetXPath(node.ParentNode) + "/" + temp
End If
Return temp
End Function
My 10p worth is a hybrid of Robert and Corey's answers. I can only claim credit for the actual typing of the extra lines of code.
private static string GetXPathToNode(XmlNode node)
{
if (node.NodeType == XmlNodeType.Attribute)
{
// attributes have an OwnerElement, not a ParentNode; also they have
// to be matched by name, not found by position
return String.Format(
"{0}/@{1}",
GetXPathToNode(((XmlAttribute)node).OwnerElement),
node.Name
);
}
if (node.ParentNode == null)
{
// the only node with no parent is the root node, which has no path
return "";
}
//get the index
int iIndex = 1;
XmlNode xnIndex = node;
while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; }
// the path to a node is the path to its parent, plus "/node()[n]", where
// n is its position among its siblings.
return String.Format(
"{0}/node()[{1}]",
GetXPathToNode(node.ParentNode),
iIndex
);
}