tags:

views:

255

answers:

1

hi,

Iam trying to configure nutch for running multi-threaded crawling.

However , Iam facing an issue. I am not able to run crawl with multiple threads , I have modified the nutch-site.xml to use 25 threads but still I can see only 1 Threads running.

<property>
  <name>fetcher.threads.fetch</name>
  <value>25</value>
  <description>The number of FetcherThreads the fetcher should use.
    This is also determines the maximum number of requests that are 
    made at once (each FetcherThread handles one connection).</description>
</property>

<property>
  <name>fetcher.threads.per.host</name>
  <value>25</value>
  <description>This number is the maximum number of threads that
    should be allowed to access a host at one time.</description>
</property>

I always get the value of activeThreads=25, spinWaiting=24, fetchQueues.totalSize=some value.

Whats the meaning of this, can you please explain whats the issue and how can I solve it.

I will highly appreciate your help.

Thanks, Sumit

+1  A: 

I think your issue is related to a known bug w/the new Nutch fetcher. See NUTCH-721.

You can try using OldFetcher (if you have Nutch 1.0) to see if that solves your problem.

-- Ken

hi ken, thanks for your answer, the issue was with host per ip which was not set properly, when I set it to 25 its working properly now.I really like your bixo crawler, iam a fan of it and Iam using it in lots of my per projects :)
Sumit Ghosh