Hi everyone!
I am stuck! Can`t get Nutch to crawl for me by small patches. I start it by bin/nutch crawl command with parameters -depth 7 and -topN 10000. And it never ends. Ends only when my HDD is empty. What i need to do:
- Start to crawl my seeds with possibility to go further on outlinks.
- Crawl 20000 pages, then index them.
- Crawl another 20000 pages, index them and merge with first index.
- Loop step 3 n times.
Tried also with scripts found in wiki, but all scripts i found don't go further. If i run them again, they do everything from beginning. And in the end of script i have the same index i had, when started to crawl. But, i need to continue my crawl.
Some help would be very usefull!