views:

232

answers:

4

I'm searching for a good book which discusses the problems of developing a web-crawler in java. Not this typical "search engine optimization" books.

What books can you recommend?

A: 

Maybe Lucene in Action

Thiyagaraj
thanks, i got that one. and a few others about information retrieval. but i'm looking for one which focus mainly on the crawling part.
Chris
+4  A: 

I don't know of any certain book about it but it has a very wide range of topics where you might want to take a look at:

Daff
+2  A: 

I've referenced the O'Reilly book Spidering Hacks on several occasions when I've needed to do something along these lines :)

warren
+1  A: 

First, a good base in HTTP & its surrondings (URI, URL, cookies, etc...) would be good, so:

HTTP: The Definitive guide

HTTP Developer's handbook

And two books that seems spot on:

HTTP Programming Recipes for Java Bots

Programming spiders, bots and aggregators in Java

elhoim