views:

7

answers:

0

Hi,

I am using Heritrix API for crawling web sites. I am facing a problem in writing my own post processor similiar to LinksScoper.

Class which Heritrix API provied LinksScoper uses isInScope(CandidateURI) to check if CandidateURI is in scope or not. But It applies all the rules in one shot.

Is there a way to write my own post processor which can get deciding rules(defined in scope) and apply them on extracted links of a CrawlURI one by one So that I can add my own functionality between these rules.

Basically I want to know, How to retrieve set of all deciding rules in my own post-processor.

Thanks a lot.