views:

433

answers:

4

What is the XPath (in C# API to XDocument.XPathSelectElements(xpath, nsman) if it matters) to query all MyNodes from this document?

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <MyNode xmlns="lcmp" attr="true">
    <subnode />
  </MyNode>
</configuration>
  • I tried /configuration/MyNode which is wrong because it ignores the namespace.
  • I tried /configuration/lcmp:MyNode which is wrong because lcmp is the URI, not the prefix.
  • I tried /configuration/{lcmp}MyNode which failed because Additional information: '/configuration/{lcmp}MyNode' has an invalid token.

EDIT: I can't use mgr.AddNamespace("df", "lcmp"); as some of the answerers have suggested. That requires that the XML parsing program know all the namespaces I plan to use ahead of time. Since this is meant to be applicable to any source file, I don't know which namespaces to manually add prefixes for. It seems like {my uri} is the XPath syntax, but Microsoft didn't bother implementing that... true?

+2  A: 

You need to use an XmlNamespaceManager as follows:

   XDocument doc = XDocument.Load(@"..\..\XMLFile1.xml");
   XmlNamespaceManager mgr = new XmlNamespaceManager(new NameTable());
   mgr.AddNamespace("df", "lcmp");
   foreach (XElement myNode in doc.XPathSelectElements("configuration/df:MyNode", mgr))
   {
       Console.WriteLine(myNode.Attribute("attr").Value);
   }
Martin Honnen
Yes, I think that would work, but I can't do that. Since the XML parsing code is agnostic to the actual XML file and any namespaces it uses, mgr.AddNamespace("df", "lcmp"); is an impossible line to write...
Scott Stafford
But you parsing code can't be agnostic to element names, right? Namespace is considered a part of name so ignoring it is kinda poor design, but if you sure there will be no namespace conflicts you can do something like "configuration/*[local-name() = 'MyNode']"
Oleg Tkachenko
Scott, please explain how your code is supposed to identify the element if the namespace URI is not known? What is your code looking for exactly, elements with local name "MyNode" in any namespace? Then use Oleg's suggestion. Otherwise explain in more detail what elements exactly you are looking for.
Martin Honnen
@Martin/Oleg: The XPath should specify the namespace, of course, like you say. But the XML I'm reading from doesn't alias/prefix the namespace. /configuration/lcmp:MyNode is incorrect because 'lcmp' in that XPath is a namespace prefix, not a namespace URI. /configuration/{lcmp}MyNode seems to be the proper syntax but C# doesn't seem to support the {} notation.
Scott Stafford
+1  A: 

Here's an example of how to make the namespace available to the XPath expression in the XPathSelectElements extension method:

using System;
using System.Xml.Linq;
using System.Xml.XPath;
using System.Xml;
namespace XPathExpt
{
 class Program
 {
   static void Main(string[] args)
   {
     XElement cfg = XElement.Parse(
       @"<configuration>
          <MyNode xmlns=""lcmp"" attr=""true"">
            <subnode />
          </MyNode>
         </configuration>");
     XmlNameTable nameTable = new NameTable();
     var nsMgr = new XmlNamespaceManager(nameTable);
     // Tell the namespace manager about the namespace
     // of interest (lcmp), and give it a prefix (pfx) that we'll
     // use to refer to it in XPath expressions. 
     // Note that the prefix choice is pretty arbitrary at 
     // this point.
     nsMgr.AddNamespace("pfx", "lcmp");
     foreach (var el in cfg.XPathSelectElements("//pfx:MyNode", nsMgr))
     {
         Console.WriteLine("Found element named {0}", el.Name);
     }
   }
 }
}
Dan Blanchard
@Dan: Yes, I think that works but requires hardcoding any used namespaces.. whereas I can only control the XPath -- see my comment under @Martin Honnen's answer.
Scott Stafford
+2  A: 

XPath is (deliberately) not designed for for the case where you want to use the same XPath expression for some unknown namespaces that only live in the XML document. You are expected to know the namespace ahead of time, declare the namespace to the XPath processor, and use the name in your expression. The answers by Martin and Dan show how to do this in C#.

The reason for this difficulty is best expressed in the XML namespaces spec:

We envision applications of Extensible Markup Language (XML) where a single XML document may contain elements and attributes (here referred to as a "markup vocabulary") that are defined for and used by multiple software modules. One motivation for this is modularity: if such a markup vocabulary exists which is well-understood and for which there is useful software available, it is better to re-use this markup rather than re-invent it.

Such documents, containing multiple markup vocabularies, pose problems of recognition and collision. Software modules need to be able to recognize the elements and attributes which they are designed to process, even in the face of "collisions" occurring when markup intended for some other software package uses the same element name or attribute name.

These considerations require that document constructs should have names constructed so as to avoid clashes between names from different markup vocabularies. This specification describes a mechanism, XML namespaces, which accomplishes this by assigning expanded names to elements and attributes.

That is, namespaces are supposed to be used to make sure you know what your document is talking about: is that <head> element talking about the preamble to an XHTML document or somebodies head in an AnatomyML document? You are never "supposed" to be agnostic about the namespace and it's pretty much the first thing you ought to define in any XML vocabulary.

It should be possible to do what you want, but I don't think it can be done in a single XPath expression. First of all you need to rummage around in the document and extract all the namespaceURIs, then add these to the namespace manager and then run the actual XPath expression you want (and you need to know something about the distribution of namespaces in the document at this point, or you have a lot of expressions to run). I think you are probably best using something other than XPath (e.g. a DOM or SAX-like API) to find the namespaceURIs, but you could also explore the XPath namespace-axis (in XPath 1.0), use the namespace-uri-from-QName function (in XPath 2.0) or use expressions like Oleg's "configuration/*[local-name() = 'MyNode']". Anyway, I think your best bet is to try and avoid writing namespace agnostic XPath! Why do you not know your namespace ahead of time? How are you going to avoid matching things you don't intend to match?

Edit - you know the namespaceURI?

So it turns out that your question confused us all. Apparently you know the namespace URI, but you don't know the namespace prefix that's used in the XML document. Indeed, in this case no namespace prefix is used and the URI becomes the default namspace where it is defined. The key thing to know is that the chosen prefix (or lack of a prefix) is irrelevant to your XPath expression (and XML parsing in general). The prefix / xmlns attribute is just one way to associate a node with a namespace URI when the document is expressed as text. You may want to take a look at this answer, where I try and clarify namespace prefixes.

You should try to think of the XML document in the same way the parser thinks of it - each node has a namespace URI and a local name. The namespace prefix / inheritance rules just saves typing the URI out lots of times. One way to write this down is in Clark notation: that is, you write {http://www.example.com/namespace/example}LocalNodeName, but this notation is usually just used for documentation - XPath knows nothing about this notation.

Instead, XPath uses its own namespace prefixes.Something like /ns1:root/ns2:node. But these are completely separate from and nothing to do with any prefixes that may be used in the original XML document. Any XPath implementation will have a way to map it's own prefixes with namespace URIs. For the C# implementation you use an XmlNamespaceManager, in Perl you provide a hash, xmllint takes command line arguments... So all you need to do is create some arbitrary prefix for the namespace URI you know, and use this prefix in the XPath expression. It doesn't matter what prefix you use, in XML you just care about the combination of the URI and the localName.

The other thing to remember (it's often a surprise) is that XPath doesn't do namespace inheritance. You need to add a prefix for every that has a namespace, irrespective of whether the namespace comes from inheritance, an xmlns attribute, or a namespace prefix. Also, although you should always think in terms of URIs and localNames, there are also ways to access the prefix from an XML document. It's rare to have to use these.

Andrew Walker
@Andrew: I DO know the namespace ahead of time and can put it in the XPath. What I don't know is the namespace prefix, which is what is used when you say something like "/configuration/lcmp:MyNode". "/configuration/{lcmp}MyNode" seems to be the proper syntax to use the namespace URI instead of a prefix, but C# doesn't seem to support the {} notation. And I don't have a prefix.
Scott Stafford
Ah, I see. I'll write a new answer - basically you just need to know that the namespace prefix in your XML document has nothing in common with the namespace prefix in the XPath expression other than they both have to map to the same nsURI.
Andrew Walker
@Andrew: Very informative and verbose edit-writeup but I don't think it actually addresses my question, which is: what XPath finds that node? Also, are you saying that if the XML DID specify a prefix (which it doesn't) then the XPath query to find that couldn't use it?
Scott Stafford
Well the answer is whatever XPath namespace prefix you choose. The prefix of lack of prefix declared in the XML document is not relevant to your problem at all. Only the declared namespace URI. You choose the mapping between namespace URI and XPath prefix that you use in your XPath expression.
Andrew Walker
@Andrew: How do I specify a prefix to use in the XPath expression, without writing C# code and hardcoding the XmlNamespaceManager to know every possible URI?
Scott Stafford
+2  A: 

The configuration element is in the unnamed namespace, and the MyNode is bound to the lcmp namespace without a namespace prefix.

This XPATH statement will allow you to address the MyNode element without having declared the lcmp namespace or use a namespace prefix in your XPATH:

/configuration/*[namespace-uri()='lcmp' and local-name()='MyNode']

It matches any element that is a child of configuration and then uses a predicate filer with namespace-uri() and local-name() functions to restrict it to the MyNode element.

If you don't know which namespace-uri's will be used for the elements, then you can make the XPATH more generic and just match on the local-name():

/configuration/*[local-name()='MyNode']

However, you run the risk of matching different elements in different vocabularies(bound to different namespace-uri's) that happen to use the same name.

Mads Hansen
@Mads: Ah, interesting, I didn't know about the "[namespace-uri()='lcmp'" syntax... that should work and if so (I'll try on Monday) I'll mark this as answer. Do you know if the "/configuration/{lcmp}MyNode" is actually correct and simply not supported by C#?
Scott Stafford
@Scott No, the syntax you were trying to use is not a valid XPATH statement and isn't supported in any implementation that I'm aware of. Although it may expand to that QName you can't address it that way in your XPATH statement.
Mads Hansen
@Mads: Worked like a charm, thanks a lot.
Scott Stafford
But if the namespace URI is known (and Scott now says it is) it's worth noting that this approach is unnecessary brittle for the reason Mads states ("you run the risk of matching different elements in different vocabularies"). The fact that this works does not make it a good idea (unless you really don't know the URI).
Andrew Walker
@Andrew: I never changed my tune. The namespace URI is known, as you can see in the original question. The xmlns="lcmp" command is giving a namespace URI, not a prefix. And @Mads' suggestion is to use local-name() AND namespace-uri(), which is why his answer was correct. He does go on to say you have the option of not using namespace-uri(), but that is only an afterthought.
Scott Stafford