tags:

views:

236

answers:

5

I have the following kind of HTML. The content is grouped by the <div "id=foo"> and <div "id=foo1"> elements, with <div "style=padding…"> in-between.

I'm trying to figure out how to craft an XPath expression that will allow me to trigger off the "id=foo" to return the sibling <div>s with "style=padding…"

Getting the <div id="foo"> is trivial. However, I can't just do a following-sibling based on the "style=padding…" because it then returns all the matching <div>s.

I need a way to return the matching <div>s until I hit the sibling that matches the "id=foo1". I'm pretty sure there's a simple approach that I'm missing!

<div id="foo">stuff...</div>

<div style="padding:2px; ">stuff...</div>

<div id="foo1">stuff...</div>

<div id="foo">stuff...</div>

<div style="padding:2px; ">stuff...</div>
<div style="padding:2px; ">stuff...</div>
<div style="padding:2px; ">stuff...</div>

<div id="foo1">stuff...</div>
A: 

give them a class name rather than using an inline style

matpol
A: 

I dont think this is feasible using XPath queries. It would require you to remember the index of the selected div (not that hard), but then compare the index of its siblings to that one, and the first #foo1 div that follows it. This is, if its even possible, a very complex XPath query. XPath does not easily allow you to preserve multiple scopes to compare elements or attributes with.

You'd be better off first selecting the two delimiter divs, and then match the ones in between. This is much easier to do in code, then in XPath.

If you really need to do it in XPath, you want the delimiter divs to have a different ids (to start with, multiple elements with the same id is invalid anyway, so use different ones, or class names), and then somehow match ids or class names on the divs with padding. In other words, change the HTML to provide enough reference instead of try to solve it entirely in XPath.

Kamiel Wanrooij
+2  A: 

Is there some reason you can't take the simple approach of picking all of the divs that don't have id attributes?

div[not(@id)]

Or, perhaps, divs with a style attribute?

div[@style]

If, for some reason, that's not acceptable, you can go with something more like what you were thinking:

div[@style][following-sibling::div[@id='foo1']]

Which gets all of the divs with style attributes which come before divs matching a particular id. Is that what you're asking for?

I imagine your actual input HTML is less trivial than the example you've provided, but all of these XPath expressions I've listed work with your example. If you could provide more specific detail about what your expected output is and what issues you've been facing then I can give you more help.

Welbog
A: 

Your best long-term bet is to fix the HTML. Any other solution is fragile.

Computer Linguist
A: 

One not so nice looking way of doing what you seem to intend would be as follows (note that it is based on the assumption that you really have multiple <div>s with the same id!):

/*/div[@id='foo'][n]/following-sibling::div[@style='padding…']
[
  count(preceding-sibling::div[@id='foo']) 
  =
  count(/*/div[@id='foo'][n]/preceding-sibling::div[@id='foo']) + 1
]

The first line of the XPath expression takes any <div style="padding…"> that is a following sibling of the n'th <div id="foo"> (this is as far as you got on your own, selecting all of them).

It then counts the preceding-sibling <div id="foo"> for each of them, and matches only those that have the correct number here, e.g. one more <div id="foo"> than the respective <div id="foo"> itself has. Vary the number n to select another set.

If your input does, in fact, not have multiple elements with the same id, it gets a lot simpler:

//div[@style='padding…'][preceding-sibling::div[@id][1]/@id = 'foo']

This selects those <div style="padding…"> where the first preceding <div> (that has an id) has an id value of 'foo'. As indicated, this implies there is only one <div> with an id of 'foo', and that the other preceding <div>s do not have an id.

Tomalak