Hi, I want to write my own HTML parser plugin for nutch. I am doing focused crawling by generating outlinks falling only in specific xpath. In my use case, I want to fetch different data from the html pages depending on the current depth of the crawl. So I need to know the current depth in HtmlParser plugin for each content that I am parsing.
Is it possible with Nutch? I see CrawlDatum does not have crawl_depth information. I was thinking of having map of information in another data structure. Does anybody have better idea?
Thanks