Nutch Custom Url Partitioner | ansaurus

tags:

nutch

views:

27

answers:

0

Q:

Nutch Custom Url Partitioner

Hi, I am writing custom search task using nutch for intranet crawl. I am using Hadoop for it. I want to spawn the task across multiple hadoop slaves by dividing the seed urls evenly. I guess this job is taken care by the partioner.

I see the default implementation of Nutch UrlPartitioner partitions url by Host, Domain or IP. I want to override that behavior and simply divide the seeds equally based on number of maxthreads I pass on command line.

Could I do that with simple config changes other than re-writing the Partioner?

EDIT
Custom search task is being written by re-writing Crawl.java

Thanks Nayn

related questions

Problem with running the Nutch command from PHP exec()

configuring nutch regex-normalize.xml

How to enable follow Redirect in Nutch-1.0

Nutch issues with crwaling website where the url differes only in termes of parameters passes

Anyone has worked with a PHP API to read 'Nutch search engine' crawl results?

How can I crawl pdf files that are served on internet using Nutch-1.0 using http protocol

Crawling Files using http protocol

how nutch plugins work?

Why doesn't Nutch seem to know about "Last-Modified"?

Nutch plugin development

rss feeds in nutch

nutch field problem

Nutch Multithreading

Nutch search always returns 0 results

how to make nutch crawl file system?

how to do OR search in nutch?

How is an aggregator built?

Parsing html data with nutch 1.0 and a custom plugin

What is the best way to freshen a Nutch index?

Problem running Java .war on Tomcat

Apache Nutch on Windows

Performance Benchmarking for Apache Nutch

How do we create a simple search engine using Lucene, Solr or Nutch?

Using Nutch crawler with Solr

Java Lucene integration with .Net