ansaurus

Question

Beautifulsoup, Python and HTML automatic page truncating?

Answer 1

+1 A:

Try using lxml.html. It is a faster, better html parser, and deals better with broken html than latest BeautifulSoup. It is working fine for your example page, parsing the entire page.

import lxml.html

doc = lxml.html.parse('http://voinici.ceata.org/~sana/test.html')
print len(doc.findall('//div'))

Code above returns 131 divs.

nosklo 2010-09-14 11:27:28

Thanks for your answer.

Laurențiu Dascălu 2010-09-14 17:58:04

ansaurus

tags:

views:

answers:

Beautifulsoup, Python and HTML automatic page truncating?

related questions