views:

131

answers:

2

Hey i have some doubt about PHP based web crawlers,can it run like the java thread based one? i am asking it because, in java the thread can be executed again and again, i dont think, PHP have something like thread function, can you guys please say, which web crawler will be more use full?A PHP Based or A Java Based

+1  A: 

Instead of writign your own use on of the following. Btw, Java based web crawlers are preferred. My fav Nutch.

Java based: Nutch, Heritrix, JSpider, JoBo (simple crawler)

PHP based: PHPCrawl

Ankit Jain
@Ankit : Which is Good??Java based or PHP Based?
Java based! Use Nutch it comes with Lucene.
Ankit Jain
@Ankit : What is the Use of Lucene?
Nutch does web-crawling (following and downloading links) stuff only. Lucene is an indexing engine and builds a `inverted index` of documents. Don't worry abt Lucene, Nutch takes care of it. (vote up if it works for you :P )
Ankit Jain
@Ankit : i dont have enough point to vote up :(
A: 

In general, you will need to jump through more hoops to run long-running tasks in PHP, as it's much more of an request/response-based setup.

Tassos Bassoukos
@Tassos : I dont Understand