Basically, I want to use BeautifulSoup to grab strictly the visible text on a webpage... For instance, this webpage is my test case http://www.nytimes.com/2009/12/21/us/21storm.html .. And I mainly want to just get the body text (article) and and maybe even a few tab names here and there. However after trying this suggestion http://stackoverflow.com/questions/1752662/beautifulsoup-easy-way-to-to-obtain-html-free-contents > that returns lots of tags and html comments which aren't needed.. I can't figure out what are the right arguments to findAll (http://www.crummy.com/software/BeautifulSoup/documentation.html#arg-limit) that I need to do what I need...
So, how should I find all visible text excluding scripts/comments/css/junk...etc.. ??