ansaurus

Question

Answer 1

+1 A:

This is a slight improvement, but I couldn't figure out how to get rid of the three parents.

x[0].parent.parent.parent.findAll('td')[1].string

Mark Byers 2009-11-30 00:08:25

Answer 2

+2 A:

(Edit: apparently the HTML the OP posted lies -- there is in fact no tbody tag to look for, even though he made it a point of including in that HTML. So, changing to use table instead of tbody).

As there may be several table-rows you want (e.g., see the sibling URL to the one you give, with the last digit, 4, changed into a 5), I suggest a loop such as the following:

# locate the table containing a cell with the given text
owner = re.compile('Owner Name')
cell = soup.find(text=owner).parent
while cell.name != 'table': cell = cell.parent
# print all non-empty strings in the table (except for the given text)
for x in cell.findAll(text=lambda x: x.strip() and not owner.match(x)):
  print x

this is reasonably robust to minor changes in page structure: having located the cell of interest, it loops up its parents until it's found the table tag, then over all nagivable strings within that table that aren't empty (or just whitespace), excluding the owner header.

Alex Martelli 2009-11-30 00:36:16

Thanks for the answer, I get an errorcell.name has no attribute name I guess I can use a try, not real familiar with using try, Is there a better way to address this?

Vincent 2009-11-30 02:23:53

The URL you gave has no such error w/my code (that's why I have the `.parent` in the 2nd line of my code: to move up from the navigable string, to a tag, which _does_ have a name). What exact URL has such a problem with the code I posted in my answer?

Alex Martelli 2009-11-30 05:52:38

I just checked this URL, and there is no `<tbody>` tag. I think you'll just have to look for the "Owner Name(s)" table column header, and then read the values in all rows of that table.

Paul McGuire 2009-11-30 10:03:52

Like Paul said there is no tboby, the url I am using is the one posted. I guess the solution that would make the most sense to me is to be able to find a table based on some content. Then select the item in the table I want. (soup(Find a table that has "owner name"))

Vincent 2009-11-30 14:55:45

@Vincent, so why do you show as "the relevant HTML" one **with** `tbody`? Ah well, just use `table` instead of `tbody` in the third line. Here, let me edit the answer to show that trivial change.

Alex Martelli 2009-11-30 15:47:06

@vincent, there -- edited and added comment to show how the first three lines do **exactly** "find a table based on some content", the next two emit the (**plural** of course!-) other item**s** (strings) in that table. Not sure what you mean by "select" (?) and by using the singular, any more than I have any idea about why you showed a tbody tag that just wasn't there -- ah well!-)

Alex Martelli 2009-11-30 15:53:40

Duh, Clearly there is a tbody, Sorry about that. Still doesn't work for me but that might be my problem.I'd like to accept your answer so I will try more, I am going to post an aswer I got at beautiful soup group although I like the answer I did not get it as promptly as you answered, Thanks again.

Vincent 2009-11-30 20:19:26

@vincent: Clearly there is *NOT* a tbody in the HTML obtained by reading that URL.

John Machin 2009-11-30 20:46:41

Not sure what the deal is with the tbody. I asure you I did not type by hand the "relevant html" and that I did copy and pasted. Thanks for the time you spent on this Alex Martelli

Vincent 2009-12-10 23:25:08

Answer 3

+3 A:

This is Aaron DeVore's answer from the Beautifulsoup discussion group, It work well for me.

soup = BeautifulSoup(...)
label = soup.find(text="Owner Name(s)")

Needs Tag.string to get to the actual name string

name = label.findNext('td').string

If you're doing a bunch of them, you can even go for a list comprehension.

names = [unicode(label.findNext('td').string) for label in
soup.findAll(text="Owner Name(s)")]

Vincent 2009-11-30 20:23:16

ansaurus

tags:

views:

answers:

Beautifulsoup get value in table

related questions