Hey i have some doubt about PHP based web crawlers,can it run like the java thread based one? i am asking it because, in java the thread can be executed again and again, i dont think, PHP have something like thread function, can you guys please say, which web crawler will be more use full?A PHP Based or A Java Based
+1
A:
Instead of writign your own use on of the following. Btw, Java based web crawlers are preferred. My fav Nutch.
Java based: Nutch, Heritrix, JSpider, JoBo (simple crawler)
PHP based: PHPCrawl
Ankit Jain
2010-07-27 07:58:15
@Ankit : Which is Good??Java based or PHP Based?
2010-07-27 07:59:19
Java based! Use Nutch it comes with Lucene.
Ankit Jain
2010-07-27 08:02:43
@Ankit : What is the Use of Lucene?
2010-07-27 08:04:50
Nutch does web-crawling (following and downloading links) stuff only. Lucene is an indexing engine and builds a `inverted index` of documents. Don't worry abt Lucene, Nutch takes care of it. (vote up if it works for you :P )
Ankit Jain
2010-07-27 08:10:53
@Ankit : i dont have enough point to vote up :(
2010-07-27 08:12:09
A:
In general, you will need to jump through more hoops to run long-running tasks in PHP, as it's much more of an request/response-based setup.
Tassos Bassoukos
2010-07-27 07:58:59