views:

39

answers:

1

I need to get all table rows on a page that contain a specific string 'abc123123' in them.

The string is inside a TD, but I need the entire TR if it contains the 'abc123123' anywhere inside.

I tried this:

userrows = s.findAll('tr', contents = re.compile('abc123123'))

I'm not sure if contents is the write property.

My html looks something like:

<tr>
   <td>
   </td>
   <td><table>.... abc123123 </table><tr>
   ..
</tr>
<tr>
..
</tr>
..
..
+2  A: 

No, the extra keyword arguments beyond the specified ones (name, attrs, recursive, text, limit) all refer to attributes of the tag you're searching for.

You cannot search for name and text at the same time (if you specify text, BS ignores name) so you need separate calls, e.g:

allrows = s.findAll('tr')
userrows = [t for t in allrows if t.findAll(text=re.compile('abc123123'))]

Here I'm using a list comprehension since I assume you want a list of the relevant tag objects, as findAll itself gives you.

Alex Martelli
or I could just do if not t.findAll(..) continue thanks, trying it now!
Blankman
@Alex ok it doesn't work, because the text i'm looking for is actually inside a href tag...hmm
Blankman
@Blankman, there is no `href` tag in HTML, I guess you mean the href attribute of an `a` tag. In that case, in the second statement use `t.findall('a', href=re.compile('abc123123'))`, of course.
Alex Martelli
ofcourse! sheesh!
Blankman