Full text indexer with line level results, substring searches, and incremental update support?

I'm looking for a full text indexing package that is being maintained (i.e. not an end of life dead package) that can would ideally have support for:

substring matches
incremental updates
line level results

Also ideal would be support for

boolean matches
adjacency searches "stringX found near stringY"

A little more detail about the situation - I currently have a 'grep on steroids' that searches through system log files stored in a central location, split by host and day, updated continuously.

approximately 40-80 GB of mixed compressed and raw files
raw uncompressed data size - 350 - 500 GB
20,000+ files

A solution like Splunk would be ideal, but pricing for our data change rate (2-4GB/day) - even with educational organization pricing - is outrageously high.

I have used freeWAIS-sf in the past, and am currently using namazu for limited indexing of a small document set elsewhere.

I don't require spidering support, I can feed it a list of files to index and they will all be on local disk.

Problem is - freeWAIS-sf appears to essentially be abandoned, and namazu doesn't have any line-level results - only by-file.

Any suggestions for products to use? One option I did consider was to use something like namazu, but to split the files before indexing into chunks and post-process search results to reassemble, but that seems very hackish.

EDIT

I'm open to building multiple indexes as well as a way of doing incremental updates - even though I'd have to aggregate the multiple search results.

I can also live with a delay on indexing for 'Todays' results, indexing doesn't have to be real-time.

EDIT

Solr appears to be quite useful as a tool, however, it looks to have the same issue as using namazu or the others - if I want file level positions of the results - I basically have to do it myself externally - or pre-split the file into chunks as I generate the XML to load into the index server. While this does provide a very structured way of doing it, if I have to do all that myself, it's going back to the starting point.

Looking into Lucene now... Had seen it before, but not dug in deeply. Looks like it might fit the bill.

Nathan Neulinger 2009-01-07 18:49:57

Same issue with solr as swish/others - no built in support for file offset, to do that, I'll have to index each file as multiple objects in the index.

Nathan Neulinger 2009-01-09 14:10:59

That site appears to be all Chinese ... not so useful for an English speaker such as myself.

offby1 2010-07-02 21:08:13

That should be sphinxsearch.com, not .org. No idea why I would have put that in like that ;)

gms8994 2010-07-06 13:08:21

ansaurus

tags:

views:

answers:

Full text indexer with line level results, substring searches, and incremental update support?

related questions