ansaurus

Question

how do i stop beautiful soup from skipping rows while parsing?

Answer 1

+2 A:

I am still learning a lot but I am going to suggest you try lxml. I am going to make a stab at this and I think it will mostly get you there but there may be some niceties I am not certain about.

assuming this1 is a string

from lxml.html import fromstring
this1_tree=fromstring(this1)
all_cells=[(item[0], item[1]) for item in enumerate(this1_tree.cssselect('td'))] # I am hoping this gives you the cells with their relative position in the document)

The only thing I am not totally certain about is whether you test the key or value or text_content for each cell to find out if it has the string that you are seeking in the anchor reference or text. That is why I wanted a sample of your html. But one of those should work

the_cell_before_numbers=[]
for cell in all_cells:
    if 'Item' in cell[1].text_content():
        the_cell_before_numbers.append(cell[0])

Now that you have the cell before your can then get the value you need by getting the text content of the next cell

todays_price=all_cells[the_cell_before_number+1][1].text_content()

I am sure there is a prettier way but I think this will get you there.

I tested using your html and I got what you were looking for.

PyNEwbie 2010-03-06 22:54:59

I updated with a sample of the html

Pevo 2010-03-06 23:00:50

sorry I'm brand new to this. I'm not sure how to implement this? =/ where exactly do I put all of this?

Pevo 2010-03-06 23:21:11

Well I am using lxml instead of BeautifulSoup. So you need to install lxml. You need to go back to an earlier version of this question as my answer was built using that description. But this code should get you there. It assumes that this1 is the htm page you pulled in using urllib and it is a string object.

PyNEwbie 2010-03-06 23:30:43

ic, well my problems now are of another nature with installing lxml gives me an annoying error. But I believe this will get me were I want eventually. much thanks.

Pevo 2010-03-06 23:42:36

What error did you get?

PyNEwbie 2010-03-06 23:51:49

an error regarding Microsoft visual basic 9 and about how much it fails and failed with exit status 2

Pevo 2010-03-07 00:17:09

PyNEwbie 2010-03-07 00:41:08

I already have Microsoft visual basic 9 installed. there must be some problem with it im assuming.

Pevo 2010-03-07 01:19:42

ansaurus

tags:

views:

answers:

how do i stop beautiful soup from skipping rows while parsing?

related questions