I'm searching for a good book which discusses the problems of developing a web-crawler in java. Not this typical "search engine optimization" books.
What books can you recommend?
I'm searching for a good book which discusses the problems of developing a web-crawler in java. Not this typical "search engine optimization" books.
What books can you recommend?
I don't know of any certain book about it but it has a very wide range of topics where you might want to take a look at:
I've referenced the O'Reilly book Spidering Hacks on several occasions when I've needed to do something along these lines :)
First, a good base in HTTP & its surrondings (URI, URL, cookies, etc...) would be good, so:
And two books that seems spot on: