ansaurus

Question

What is this function doing in Python involving urllib2 and BeautifulSoup?

Answer 1

+3 A:

el.findAll(text=True) returns all the text contained within an element and its sub-elements. By text I mean everything not inside a tag; so in <b>hello</b> then "hello" would be the text but <b> and </b> would not.

That function therefore joins together all text found beneath the given element and strips whitespace off from the front and back.

Here's a link to the findAll documentation: http://www.crummy.com/software/BeautifulSoup/documentation.html#arg-text

Eli Courtwright 2009-06-14 02:13:37

use backticks for HTML. :)

Paolo Bergantino 2009-06-14 02:18:33

why is there a '' with nothing in it to open the text = ? and what do the commands join and strip do exactly? And why did this have to be defined as a function before it was applied to data? Thanks.

Alex 2009-06-14 02:26:23

''.join means join each item with an empty string (so there's no delimiter).

Jacob 2009-06-14 02:29:11

What is its purpose then if ''.join('hello world') = 'hello world'

Alex 2009-06-14 02:36:23

The purpose is to join *sequences* of strings into a single string: so ''.join(["hel", "low", "ord"]) gives "helloworld", for example.

Alex Martelli 2009-06-14 02:40:28

ansaurus

tags:

views:

answers:

What is this function doing in Python involving urllib2 and BeautifulSoup?

related questions