ansaurus

Question

Answer 1

+2 A:

You need to put the value of the node you're looking for in quotes:

".../address_component[type='administrative_area_level_1']/short_name"
                            ↑                           ↑

dtb 2010-08-18 19:17:37

Yeah. One of the hardest things about getting started with XPath is that it doesn't give errors for nonsense strings that could conceivably be element names but aren't meant to be. You get no warning of what you did wrong; you just get no nodes selected.

LarsH 2010-08-18 21:01:36

Answer 2

+3 A:

I'd definitely recommend using LINQ to XML instead of XPathNavigator. It makes XML querying a breeze, in my experience. In this case I'm not sure exactly what's wrong... but I'll come up with a LINQ to XML snippet instead.

using System;
using System.Linq;
using System.Net;
using System.Xml.Linq;

class Test
{
    public static void Main(string[] args)
    {
        using(WebClient webclient = new WebClient())
        {
            webclient.Proxy = null;
            string locationXml = webclient.DownloadString
                ("http://maps.google.com/maps/api/geocode/xml?address=1600"
                 + "+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false");
            XElement root = XElement.Parse(locationXml);

            XElement result = root.Element("result");
            Console.WriteLine(result.Elements("address_component")
                                    .Where(x => (string) x.Element("type") ==
                                           "administrative_area_level_1")
                                    .Select(x => x.Element("short_name").Value)
                                    .First());
            Console.WriteLine(result.Elements("address_component")
                                    .Where(x => (string) x.Element("type") ==
                                           "administrative_area_level_2")
                                    .Select(x => x.Element("long_name").Value)
                                    .First());
        }
    }
}

Now this is more code¹... but I personally find it easier to get right than XPath, because the compiler is helping me more.

EDIT: I feel it's worth going into a little more detail about why I generally prefer code like this over using XPath, even though it's clearly longer.

When you use XPath within a C# program, you have two different languages - but only one is in control (C#). XPath is relegated to the realm of strings: Visual Studio doesn't give an XPath expression any special handling; it doesn't understand that it's meant to be an XPath expression, so it can't help you. It's not that Visual Studio doesn't know about XPath; as Dimitre points out, it's perfectly capable of spotting errors if you're editing an XSLT file, just not a C# file.

This is the case whenever you have one language embedded within another and the tool is unaware of it. Common examples are:

SQL
Regular expressions
HTML
XPath

When code is presented as data within another language, the secondary language loses a lot of its tooling benefits.

While you can context switch all over the place, pulling out the XPath (or SQL, or regular expressions etc) into their own tooling (possibly within the same actual program, but in a separate file or window) I find this makes for harder-to-read code in the long run. If code were only ever written and never read afterwards, that might be okay - but you do need to be able to read code afterwards, and I personally believe the readability suffers when this happens.

The LINQ to XML version above only ever uses strings for pure data - the names of elements etc - and uses code (method calls) to represent actions such as "find elements with a given name" or "apply this filter". That's more idiomatic C# code, in my view.

Obviously others don't share this viewpoint, but I thought it worth expanding on to show where I'm coming from.

Note that this isn't a hard and fast rule of course... in some cases XPath, regular expressions etc are the best solution. In this case, I'd prefer the LINQ to XML, that's all.

¹ Of course I could have kept each Console.WriteLine call on a single line, but I don't like posting code with horizontal scrollbars on SO. Note that writing the correct XPath version with the same indentation as the above and avoiding scrolling is still pretty nasty:

            Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/" +
                "address_component[type='administrative_area_level_1']" +
                "/short_name").InnerXml);

In general, long lines work a lot better in Visual Studio than they do on Stack Overflow...

Jon Skeet 2010-08-18 19:18:25

dtb found my error. but I would love a example of linq to xml if you could provide one.

Scott Chamberlain 2010-08-18 19:19:13

@Jon Skeet: I don't think that it's a good practice to recommend LINQ when someone ask for a specific XPath expression. Plus I don't see how "compiler helping" (?!) makes this `Console.WriteLine(result.Elements("address_component").Where(x => (string) x.Element("type") == "administrative_area_level_2").Select(x => x.Element("long_name").Value).First());` better than `Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/address_component[type='administrative_area_level_1']/short_name").InnerXml);`

Alejandro 2010-08-18 20:18:54

@Alejandro: the OP *specifically* asked whether there was a better way. Personally *I* find the LINQ way easier to use - why not suggest it as an alternative? Note that in this case the error was because of a string literal not being in quotes... precisely the kind of mistake you don't get when using LINQ to XML.

Jon Skeet 2010-08-18 20:27:21

@Dimitre: For someone whose primary tags are xml, xslt and xpath, I'm not surprised you'd prefer the XPath version. For someone who already knows LINQ but isn't as comfortable with the XPath, the LINQ version can be a lot easier to understand and extend. Don't assume everyone has your XPath knowledge/comfort.

Jon Skeet 2010-08-18 20:49:30

@Jon-Skeet: Have you heard about code complexity metrics? LOC?, McCabe Cyclomatic Complexity?

Dimitre Novatchev 2010-08-18 20:52:52

@Jon-Skeet: What an ugly code! Negative examples as this are also very instructive. I am not downvoting, because taking out 2 points from 208K is ridiculous. If the downvote was proportional to the accumulated rep (e.g. in this case 1000 points) -- yes, then I would downvote this answer.

Dimitre Novatchev 2010-08-18 20:53:52

@Dimitre: I don't see why that has a large cyclomatic complexity. Where do you see that coming in? I still think you're *massively* biased towards the XPath solution because you're so familiar with it. Undoubtedly I'm biased towards LINQ solutions because I'm familiar with that too... but what's wrong with showing options? Note the OP's comment: " I would love a example of linq to xml if you could provide one."

Jon Skeet 2010-08-18 21:10:57

@Dimitre: When you delete a comment and then re-add something very similar, it makes the whole conversation look weird - looking in chronological order, it looks like I've replied to you before you ever commented!

Jon Skeet 2010-08-18 21:12:08

@Jon-Skeet: This question was tagged "xpath" and *not* "linq". How would you react if I answered a "linq" question with an XSLT solution and xslt wasn't in the tags? This has nothing to do with our "biases" -- the fact is that *you* offered a non-related answer and xpath is a desired tag. The OP's comment that he'd be interested in a linq-to-xml example came *after* your answer.

Dimitre Novatchev 2010-08-18 21:44:57

@Dimitre: The question was also tagged C#. LINQ is part of C#, and the OP explicitly requested alternative approaches if people think they're better (which I still do, for those comfortable with LINQ). Yes, the comment came after my answer - but indicated that the OP was happy to see alternative suggestions. If you want to suggest an XSLT solution to a LINQ to XML question, go ahead.

Jon Skeet 2010-08-18 21:53:33

@Dimitre: You might also want to look at LarsH's comment to the accepted answer: LINQ to XML certainly won't catch *all* such errors, but by separating out instructions from data, it will definitely catch many of them.

Jon Skeet 2010-08-18 21:54:28

@Jon-Skeet: Maybe it would be useful to know that XPath (2.0 and further) is a typed language and not at all "data". XPath 2.1 is a fully fledged FP language with HOF, currying and dynamic creation of functions. I at least am not teaching you what LINQ is or isn't.

Dimitre Novatchev 2010-08-18 22:50:16

@Dimitre: You've missed my point. As far as expressing XPath in C# is concerned, it *is* all data - it's just a string. As far as the compiler is concerned, you could have anything you like in there - it's not going to help you. If you try putting the element name in the C# code without quotes, it will fail long before you'd get to run it.

Jon Skeet 2010-08-18 23:34:40

@Jon-Skeet: It is much more useful to the readers to advise them to always check their XPath expressions (not only the syntax!) in advance, before they add code in the hosting language that evaluates these expressions. A good advice would mention that excellent XSLT/XPath/XML IDEs exist that signal immediately any XPath syntax errors as you type -- including Visual Studio, which you might probably know...

Dimitre Novatchev 2010-08-19 20:15:05

@Dimitre: I would still rather *read* code which made the difference clearer. I'm not a fan of embedding SQL in code either, for much the same reason. I think it's perfectly reasonable for us each to have our own preferences here, but I do wish you'd express yours less agressively.

Jon Skeet 2010-08-19 20:33:47

@Jon-Skeet: It's no wonder you'll feel the response "agressive" whenever you provide an answer to an area that isn't in your best/primary expertise and at that give a less than perfect answer. :) This isn't agressive -- just a deserved feedback. I fully expect you to have a similar, corresponding feedback to me should I dare give C# advice :). You might be interested to see my answer to this same question.

Dimitre Novatchev 2010-08-19 20:42:19

@Dimitre: How should I not feel it's aggressive when you say you'd like to take away 1000 rep, you question whether I've heard of cyclomatic complexity, things like "I at least am not teaching you what LINQ is or isn't" and "What an ugly code!" etc. Maybe you deem all of those as polite, but I certainly don't.

Jon Skeet 2010-08-20 06:14:24

Answer 3

+1 A:

dtb's response is accurate. I wanted to add that you can use xpath testing tools like the link below to help find the correct xpath:

http://www.bit-101.com/xpath/

Ed Schwehm 2010-08-18 19:21:58

Answer 4

A:

string url = @"http://maps.google.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&amp;sensor=false";
string value = "administrative_area_level_1";

using(WebClient client = new WebClient())
{
    string wcResult = client.DownloadString(url);

    XDocument xDoc = XDocument.Parse(wcResult);

    var result = xDoc.Descendants("address_component")
                    .Where(p=>p.Descendants("type")
                                .Any(q=>q.Value.Contains(value))
                    );

}

The result is an enumeration of "address_component"s that have at least one "type" node that has contains the value you're searching for. The result of the query above is an XElement that contains the following data.

<address_component>
  <long_name>California</long_name>
  <short_name>CA</short_name>
  <type>administrative_area_level_1</type>
  <type>political</type>
</address_component>

I would really recommend spending a little time learning LINQ in general because its very useful for manipulating and querying in-memory objects, querying databases and tends to be easier than using XPath when working with XML. My favorite site to reference is http://www.hookedonlinq.com/

EC182 2010-08-18 19:59:51

Answer 5

+1 A:

I would recommend just typing the XPath expression as part of an XSLT file in Visual Studio. You'll get error messages "as you type" -- this is an excellent XML/XSLT/XPath editor.

For example, I am typing:

<xsl:apply-templates select="@* | node() x"/>

and immediately get in the Error List window the following error:

Error   9   Expected end of the expression, found 'x'.  @* | node()  -->x<--

XSLTFile1.xslt  9   14  Miscellaneous Files

Only when the XPath expression does not raise any errors (I might also test that it selects the intended nodes, too), would I put this expression into my C# code.

This ensures that I will have no XPath -- syntax and semantic -- errors when I run the C# program.

Dimitre Novatchev 2010-08-19 20:19:59

It's good that Visual Studio provides this feedback. It's bad that you have to create an XSLT file (including bits that won't be in the final XPath) in order to get any feedback. It would be nice if Visual Studio recognised methods which took XPath expressions and could apply some intelligence to those, but that's a bit of a fond dream, I suspect. This is *always* the problem I have when we embed one language within another, whether it's XPath, HTML, SQL, regular expressions, even simple formatting strings. That's why I *tend* to prefer solutions which stick to a single source language.

Jon Skeet 2010-08-20 06:16:24

@Jon-Skeet So everything is a hammer? :)

Dimitre Novatchev 2010-08-20 13:43:50

ansaurus

tags:

views:

answers:

Trouble getting data out of a xml file

related questions