views:

53

answers:

0

I'm looking to use Heritrix to crawl web-sites. I'm wondering what tools Heritrix users are using to extract text from crawled files prior to indexing them with Lucene.