views:

152

answers:

5

I am trying to parse out some information from Google's geocoding API but I am having a little trouble with efficiently getting the data out of the xml. See link for example

All I really care about is getting the short_name from address_component where the type is administrative_area_level_1 and the long_name from administrative_area_level_2 However with my test program my XPath query returns no results for both queries.

public static void Main(string[] args)
{
    using(WebClient webclient = new WebClient())
    {
        webclient.Proxy = null;
        string locationXml = webclient.DownloadString("http://maps.google.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false");
        using(var reader = new StringReader(locationXml))
        {
            var doc = new XPathDocument(reader);
            var nav = doc.CreateNavigator();
            Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/address_component[type=administrative_area_level_1]/short_name").InnerXml);
            Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/address_component[type=administrative_area_level_2]/long_name").InnerXml);

        }
    }
}

Can anyone help me find what I am doing wrong, or recommending a better way?

+2  A: 

You need to put the value of the node you're looking for in quotes:

".../address_component[type='administrative_area_level_1']/short_name"
                            ↑                           ↑
dtb
Yeah. One of the hardest things about getting started with XPath is that it doesn't give errors for nonsense strings that could conceivably be element names but aren't meant to be. You get no warning of what you did wrong; you just get no nodes selected.
LarsH
+3  A: 

I'd definitely recommend using LINQ to XML instead of XPathNavigator. It makes XML querying a breeze, in my experience. In this case I'm not sure exactly what's wrong... but I'll come up with a LINQ to XML snippet instead.

using System;
using System.Linq;
using System.Net;
using System.Xml.Linq;

class Test
{
    public static void Main(string[] args)
    {
        using(WebClient webclient = new WebClient())
        {
            webclient.Proxy = null;
            string locationXml = webclient.DownloadString
                ("http://maps.google.com/maps/api/geocode/xml?address=1600"
                 + "+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false");
            XElement root = XElement.Parse(locationXml);

            XElement result = root.Element("result");
            Console.WriteLine(result.Elements("address_component")
                                    .Where(x => (string) x.Element("type") ==
                                           "administrative_area_level_1")
                                    .Select(x => x.Element("short_name").Value)
                                    .First());
            Console.WriteLine(result.Elements("address_component")
                                    .Where(x => (string) x.Element("type") ==
                                           "administrative_area_level_2")
                                    .Select(x => x.Element("long_name").Value)
                                    .First());
        }
    }
}

Now this is more code1... but I personally find it easier to get right than XPath, because the compiler is helping me more.

EDIT: I feel it's worth going into a little more detail about why I generally prefer code like this over using XPath, even though it's clearly longer.

When you use XPath within a C# program, you have two different languages - but only one is in control (C#). XPath is relegated to the realm of strings: Visual Studio doesn't give an XPath expression any special handling; it doesn't understand that it's meant to be an XPath expression, so it can't help you. It's not that Visual Studio doesn't know about XPath; as Dimitre points out, it's perfectly capable of spotting errors if you're editing an XSLT file, just not a C# file.

This is the case whenever you have one language embedded within another and the tool is unaware of it. Common examples are:

  • SQL
  • Regular expressions
  • HTML
  • XPath

When code is presented as data within another language, the secondary language loses a lot of its tooling benefits.

While you can context switch all over the place, pulling out the XPath (or SQL, or regular expressions etc) into their own tooling (possibly within the same actual program, but in a separate file or window) I find this makes for harder-to-read code in the long run. If code were only ever written and never read afterwards, that might be okay - but you do need to be able to read code afterwards, and I personally believe the readability suffers when this happens.

The LINQ to XML version above only ever uses strings for pure data - the names of elements etc - and uses code (method calls) to represent actions such as "find elements with a given name" or "apply this filter". That's more idiomatic C# code, in my view.

Obviously others don't share this viewpoint, but I thought it worth expanding on to show where I'm coming from.

Note that this isn't a hard and fast rule of course... in some cases XPath, regular expressions etc are the best solution. In this case, I'd prefer the LINQ to XML, that's all.


1 Of course I could have kept each Console.WriteLine call on a single line, but I don't like posting code with horizontal scrollbars on SO. Note that writing the correct XPath version with the same indentation as the above and avoiding scrolling is still pretty nasty:

            Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/" +
                "address_component[type='administrative_area_level_1']" +
                "/short_name").InnerXml);

In general, long lines work a lot better in Visual Studio than they do on Stack Overflow...

Jon Skeet
dtb found my error. but I would love a example of linq to xml if you could provide one.
Scott Chamberlain
@Jon Skeet: I don't think that it's a good practice to recommend LINQ when someone ask for a specific XPath expression. Plus I don't see how "compiler helping" (?!) makes this `Console.WriteLine(result.Elements("address_component").Where(x => (string) x.Element("type") == "administrative_area_level_2").Select(x => x.Element("long_name").Value).First());` better than `Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/address_component[type='administrative_area_level_1']/short_name").InnerXml);`
Alejandro
@Alejandro: the OP *specifically* asked whether there was a better way. Personally *I* find the LINQ way easier to use - why not suggest it as an alternative? Note that in this case the error was because of a string literal not being in quotes... precisely the kind of mistake you don't get when using LINQ to XML.
Jon Skeet
@Dimitre: For someone whose primary tags are xml, xslt and xpath, I'm not surprised you'd prefer the XPath version. For someone who already knows LINQ but isn't as comfortable with the XPath, the LINQ version can be a lot easier to understand and extend. Don't assume everyone has your XPath knowledge/comfort.
Jon Skeet
@Jon-Skeet: Have you heard about code complexity metrics? LOC?, McCabe Cyclomatic Complexity?
Dimitre Novatchev
@Jon-Skeet: What an ugly code! Negative examples as this are also very instructive. I am not downvoting, because taking out 2 points from 208K is ridiculous. If the downvote was proportional to the accumulated rep (e.g. in this case 1000 points) -- yes, then I would downvote this answer.
Dimitre Novatchev
@Dimitre: I don't see why that has a large cyclomatic complexity. Where do you see that coming in? I still think you're *massively* biased towards the XPath solution because you're so familiar with it. Undoubtedly I'm biased towards LINQ solutions because I'm familiar with that too... but what's wrong with showing options? Note the OP's comment: " I would love a example of linq to xml if you could provide one."
Jon Skeet
@Dimitre: When you delete a comment and then re-add something very similar, it makes the whole conversation look weird - looking in chronological order, it looks like I've replied to you before you ever commented!
Jon Skeet
@Jon-Skeet: This question was tagged "xpath" and *not* "linq". How would you react if I answered a "linq" question with an XSLT solution and xslt wasn't in the tags? This has nothing to do with our "biases" -- the fact is that *you* offered a non-related answer and xpath is a desired tag. The OP's comment that he'd be interested in a linq-to-xml example came *after* your answer.
Dimitre Novatchev
@Dimitre: The question was also tagged C#. LINQ is part of C#, and the OP explicitly requested alternative approaches if people think they're better (which I still do, for those comfortable with LINQ). Yes, the comment came after my answer - but indicated that the OP was happy to see alternative suggestions. If you want to suggest an XSLT solution to a LINQ to XML question, go ahead.
Jon Skeet
@Dimitre: You might also want to look at LarsH's comment to the accepted answer: LINQ to XML certainly won't catch *all* such errors, but by separating out instructions from data, it will definitely catch many of them.
Jon Skeet
@Jon-Skeet: Maybe it would be useful to know that XPath (2.0 and further) is a typed language and not at all "data". XPath 2.1 is a fully fledged FP language with HOF, currying and dynamic creation of functions. I at least am not teaching you what LINQ is or isn't.
Dimitre Novatchev
@Dimitre: You've missed my point. As far as expressing XPath in C# is concerned, it *is* all data - it's just a string. As far as the compiler is concerned, you could have anything you like in there - it's not going to help you. If you try putting the element name in the C# code without quotes, it will fail long before you'd get to run it.
Jon Skeet
@Jon-Skeet: It is much more useful to the readers to advise them to always check their XPath expressions (not only the syntax!) in advance, before they add code in the hosting language that evaluates these expressions. A good advice would mention that excellent XSLT/XPath/XML IDEs exist that signal immediately any XPath syntax errors as you type -- including Visual Studio, which you might probably know...
Dimitre Novatchev
@Dimitre: I would still rather *read* code which made the difference clearer. I'm not a fan of embedding SQL in code either, for much the same reason. I think it's perfectly reasonable for us each to have our own preferences here, but I do wish you'd express yours less agressively.
Jon Skeet
@Jon-Skeet: It's no wonder you'll feel the response "agressive" whenever you provide an answer to an area that isn't in your best/primary expertise and at that give a less than perfect answer. :) This isn't agressive -- just a deserved feedback. I fully expect you to have a similar, corresponding feedback to me should I dare give C# advice :). You might be interested to see my answer to this same question.
Dimitre Novatchev
@Dimitre: How should I not feel it's aggressive when you say you'd like to take away 1000 rep, you question whether I've heard of cyclomatic complexity, things like "I at least am not teaching you what LINQ is or isn't" and "What an ugly code!" etc. Maybe you deem all of those as polite, but I certainly don't.
Jon Skeet
+1  A: 

dtb's response is accurate. I wanted to add that you can use xpath testing tools like the link below to help find the correct xpath:

http://www.bit-101.com/xpath/

Ed Schwehm
A: 
string url = @"http://maps.google.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false";
string value = "administrative_area_level_1";

using(WebClient client = new WebClient())
{
    string wcResult = client.DownloadString(url);

    XDocument xDoc = XDocument.Parse(wcResult);

    var result = xDoc.Descendants("address_component")
                    .Where(p=>p.Descendants("type")
                                .Any(q=>q.Value.Contains(value))
                    );

}

The result is an enumeration of "address_component"s that have at least one "type" node that has contains the value you're searching for. The result of the query above is an XElement that contains the following data.

<address_component>
  <long_name>California</long_name>
  <short_name>CA</short_name>
  <type>administrative_area_level_1</type>
  <type>political</type>
</address_component> 

I would really recommend spending a little time learning LINQ in general because its very useful for manipulating and querying in-memory objects, querying databases and tends to be easier than using XPath when working with XML. My favorite site to reference is http://www.hookedonlinq.com/

EC182
+1  A: 

I would recommend just typing the XPath expression as part of an XSLT file in Visual Studio. You'll get error messages "as you type" -- this is an excellent XML/XSLT/XPath editor.

For example, I am typing:

<xsl:apply-templates select="@* | node() x"/>

and immediately get in the Error List window the following error:

Error   9   Expected end of the expression, found 'x'.  @* | node()  -->x<--

XSLTFile1.xslt  9   14  Miscellaneous Files

Only when the XPath expression does not raise any errors (I might also test that it selects the intended nodes, too), would I put this expression into my C# code.

This ensures that I will have no XPath -- syntax and semantic -- errors when I run the C# program.

Dimitre Novatchev
It's good that Visual Studio provides this feedback. It's bad that you have to create an XSLT file (including bits that won't be in the final XPath) in order to get any feedback. It would be nice if Visual Studio recognised methods which took XPath expressions and could apply some intelligence to those, but that's a bit of a fond dream, I suspect. This is *always* the problem I have when we embed one language within another, whether it's XPath, HTML, SQL, regular expressions, even simple formatting strings. That's why I *tend* to prefer solutions which stick to a single source language.
Jon Skeet
@Jon-Skeet So everything is a hammer? :)
Dimitre Novatchev