any good books or academic papers on web scraping or web spiders?
If you're looking for a general survey, I would recommend this article.
For something spicier, although still at survey levels, a nice recent article from "Data & Knowledge Engineering" is here and it specifically addresses the issues of "focused web crawlers" as opposed to classic/generic ones.
Is this the kind of reference you're looking for, or are you looking for foundational papers (i.e. ones that are probably a bit out of date by now but are widely quoted because, in their time, they provided significant breakthroughs or innovation)?
As to extracting web page contents, you may find http://portal.acm.org/citation.cfm?id=775152.775182 and http://portal.acm.org/citation.cfm?id=775047.775134 helpful.
DOM-based content extraction of HTML documents: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.57.9196&rep=rep1&type=pdf
Discovering informative content blocks from Web documents: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.7769&rep=rep1&type=pdf
http://wwwconference.org/www2008/papers/fp865.html - the best paper on www 2008.