Hi, I want nutch to crawl abc.com, but I want to index only car.abc.com. car.abc.com links can in any levels in abc.com. So, basically, I want nutch to keep crawl abc.com normally, but index only pages that start as car.abc.com. e.g. car.abc.com/toyota...car.abc.com/honda...
I set the regex-urlfilter.txt to include only car.abc.com and run the command "generate crawl/crawldb crawl/segments", but it just say "Generator: 0 records selected for fetching, exiting ..." . I guess car.abc.com links exist only in several levels deep.
How to do this? Thanks.