any good books or academic papers on web scraping or web spiders?
If you're looking for a general survey, I would recommend this article.
For something spicier, although still at survey levels, a nice recent article from "Data & Knowledge Engineering" is here and it specifically addresses the issues of "focused web crawlers" as opposed to classic/generic ones.
Is this the kind of reference you're looking for, or are you looking for foundational papers (i.e. ones that are probably a bit out of date by now but are widely quoted because, in their time, they provided significant breakthroughs or innovation)?
As to extracting web page contents, you may find and helpful.
DOM-based content extraction of HTML documents:
Discovering informative content blocks from Web documents: - the best paper on www 2008.