Think about it in "steps"... given that some x
is the root of the subtree you're considering,
x.findAll(text='price')
is the list of all items in that subtree containing text 'price'
. The parents of those items then of course will be:
[t.parent for t in x.findAll(text='price')]
and if you only want to keep those whose "name" (tag) is 'th'
, then of course
[t.parent for t in x.findAll(text='price') if t.parent.name=='th']
and you want the "next siblings" of those (but only if they're also 'th'
s), so
[t.parent.nextSibling for t in x.findAll(text='price')
if t.parent.name=='th' and t.parent.nextSibling and t.parent.nextSibling.name=='th']
Here you see the problem with using a list comprehension: too much repetition, since we can't assign intermediate results to simple names. Let's therefore switch to a good old loop...:
Edit: added tolerance for a string of text between the parent th
and the "next sibling" as well as tolerance for the latter being a td
instead, per OP's comment.
for t in x.findAll(text='price'):
p = t.parent
if p.name != 'th': continue
ns = p.nextSibling
if ns and not ns.name: ns = ns.nextSibling
if not ns or ns.name not in ('td', 'th'): continue
print ns.string
I've added ns.string
, that will give the next sibling's contents if and only if they're just text (no further nested tags) -- of course you can instead analize further at this point, depends on your application's needs!-). Similarly, I imagine you won't be doing just print
but something smarter, but I'm giving you the structure.
Talking about the structure, notice that twice I use if...: continue
: this reduces nesting compared to the alternative of inverting the if
's condition and indenting all the following statements in the loop -- and "flat is better than nested" is one of the koans in the Zen of Python (import this
at an interactive prompt to see them all and meditate;-).