views:

59

answers:

2

Hello,

I am trying to pull data from a website using objective-c. This is all very new to me, so I've done some research. What I know now is that I need to use xpath, and I have another wrapper for that called hpple for the iPhone. I've got it up and running in my project.

I am confused about the way I retrieve information from the site. Apparently I am to use regular expressions in this line of code:

NSArray * a = [doc search:@"//a[@class='sponsor']"];

This is just an example. Is that stuff in the search:@"...." the regular expression? If so, I guess I can develop the hundreds of patterns that I will need for my program to parse the site (I need a lot of data), but is there a better way? I'm very lost in this. Any help is appreciated.

A: 

That is an XPath expression, not a regular expression. The W3C has an XPath reference here: http://www.w3.org/TR/xpath/. Basically you are searching for <a> elements with the class "sponsor".

Note that this is a good thing! Regular expressions are bad for parsing HTML.

Matt
Thanks a lot Matt. I'll work through the tutorial.
JohnJ
+1  A: 

The parameter is an XPath, not a regular expression. Here's a breakdown:

  • All xpaths are interpreted relative to a context node. In this case, it's the root node.
  • // is an abbreviation meaning "all descendents"
  • a means "all child nodes with a node type of 'a'" (in HTML, that's anchors)
  • [...] contains a predicate, refining just which a to match
    • @ is an abbreviation for attribute nodes
    • @class means an attribute named "class"
    • @class='sponsor' means a class attribute equal to "sponsor". Note this will not match nodes with a class containing "sponsor", such as <a class="big sponsor" ...>; the class must be equal.

All together, we have "'a' nodes descending from the root that have class equal to 'sponsor'".

outis
Very informative. By any chance, is there a program that I can use to easily find the proper XPath? Or is XPath easy enough to work with?
JohnJ
What do you mean by "proper XPath"? Do you mean find an XPath that will select given nodes? I've seen no app which does that. However, there are plenty of [XPath testers](http://www.google.com/search?q=xpath+test) that let you test whether or not a given XPath selects the nodes you want for a given document. If you use a Mac, try [AquaPath](http://ditchnet.org/aquapath/) to begin with. As for working with XPath, search for tutorials. Some aspects of them are easy to work with; they're a little like evolved file paths (I think it happens at level 30).
outis
+1 Good answer. A minor: `a` means *children elements named `a`*.
Alejandro
@Alejandro: d'oh. In my rush to simplify things, I left out anything to do with axes.
outis