views:

45

answers:

1

hey guys does beautifulSoup strips css and javascript content? after using

content3 = ''.join(BeautifulSoup(content).findAll(text=True))

i still have them lingering around.

A: 

What exactly do you want to strip, all script and style elements? It should be something like:

''.join(BeautifulSoup(content).findAll(text=lambda text: 
text.parent.name != "script" and 
text.parent.name != "style"))
Matthew Flaschen
thats right, probably a regex replace could do that, but i was wondering if beautifulsoup handles tthat. Or does the "simple version of webstemmer" could do that too?
goh
thanks matthew!
goh