views:

378

answers:

3

I am trying to use xPath to traverse through the code of a newspaper (for the sake of practice) right now I'd like to get the main article, it's picture and the small description I get of it. But I'm not that skilled in xPath so far and I can't get to the small description.

withing this code:

<div class="margenesPortlet">

<div class="fondoprincipal">
<div class="margenesPortlet">
<a href='notas/n1092329.htm' ><img id="LinkNotaA1_Foto" src="http://i.oem.com.mx/5cfaf266-bb93-436c-82bc-b60a78d21fb6.jpg" height="250" width="300" border="0" /></a>

<div class="piefoto_esto">Un tubo de 12 pulgadas al lado de la Vialidad Sacramento que provoc&#243; el corte del servicio durante toda la ma&#241;ana y hasta alrededor de las cuatro de la tarde. Foto: El Heraldo de Chihuahua</div>

<div class="cabezaprincesto"><a href='notas/n1092329.htm' class='cabezaprincesto'  >Sin agua 8 mil usuarios</a></div>
<div class="resumenesto"><a href='notas/n1092329.htm' class='resumenesto'  >La ruptura de una l&#237;nea en el tanque de rebombeo de agua Sacramento dej&#243; sin servicio a ocho mil usuarios, en once colonias del sur de la ciudad. </a></div>
</div>
</div>

</div>

I've want to get the picture (with or without caption) and then the title of the article. These 3 things I can get by using:

//div[@class='fondoprincipal'] <-- gives me the main image and caption

//a[@class='cabezaprincesto']/text() <-- gives me the article's title

but I can't get ahold of the small description which is the div with class="resumenesto", I haven't tried getting anything by that id because the same id is used over and over through the rest of the HTML so it returns lots of extra items.

How can I get this particular one? and then would any of you recommend me a good way of parsing it to another webpage? I was thinking maybe php writing some html using those values but I'm not sure really...


Edit

What I mean by "this particular one" is how do I get this div class="resumenesto", the one residing within div class="fondoprincipal"...


Edit 2

Thank you, now xPath Traversing is a little bit more clear. But then about my second question, would any of you recommend me a good way of parsing it to another webpage? I was thinking maybe php writing some html using those values but I'm not sure really..

+1  A: 

You say "id" of resumenesto, but in your code example the div you're talking about has a class of resumenesto.

Further, when you use an xpath of something like this:

//div[@class='resumenesto']

What you're getting is a list of nodes matching that xpath. So if you want to specifically refer only to a single item in that list, you need to specify which item in the list:

//div[@class='resumenesto'][1]

Further, what do you mean by "this particular one"? The only way to tell xpath specificity is to give it context, for instance "the div with class resumenesto that resides within some other div", or "the first of the divs with class resumenesto".

Read W3Schools' overview of XPath syntax for some more info.

Edit:

To get the div residing within "fondoprincipal":

//div[@class='fondoprincipal']//div[@class='resumenesto']

This tells xpath to find any descendant div with class fondoprincipal within the document, and within that div, find any descendant div with class resumenesto.

Rahul
sorry I wrote it incorrectly, I'll be editing my code
Luis Armando
Updated answer to reflect your changes.
Rahul
A: 

And to narrow your search you can add the div too:

//div[@class='resumenesto']/a[@class='resumenesto']/text()
Martin Bring
A: 

To get it to the test you need to:

//div[@class='fondoprincipal']//a[@class='resumenesto']

Note that you want to get the a (isntead of the div as Raul suggested), since its in that element that you get the text.

Regarding putting it on a page, you can do it in asp.net. Use the XElement to load the values and then the XPathSelectElement to get the values (http://msdn.microsoft.com/en-us/library/bb156083.aspx).

eglasius