I need to create a full-text search form for a database of emails / support tickets (in C#) and I'm looking for advice and articles on how to approach this. In particular I'd like to know how to approach the classic full-text search problems, for example:
- Making sure that matches are sensible, for example if someone enters "big head" and a document contains "big hairy head", making sure that document is returned in the search.
- Ordering results by relevancy.
- How to bets display matches, for example highlighting matching terms
I know that full-text search is a fairly mammoth subject area in itself, I'm just looking for simple articles and advice on how to create something that is at least marginally useful and usable.
I've used things like Lucene.Net before - obviously some sort of full-text index is going to be needed - the challenging bit is taking the list of documents that Lucene returns and presenting it in a useful way.
UPDATE: I want to clarify slightly what I mean - there are hundreds of generic full-text search forms that all perform a very similar function, for example:
- The search button on each and every internet forum
- The search button on each and every wiki
- Windows / google desktop search
Each of those searches takes information from different sources, and displays them using different means (html, Windows form etc...) but each of those solve the same problems in varyingly complex methods, and for the most part (with the possible exception of desktop search) the input data is of the same format: Html or text.
I'm looking for advice and common strategies on how to do things like rank search results in ways that are likely to be useful to the user.
Alternatively one strategy I had considered was doing something like taking some wiki software, exporting my entire data set as text into that wiki, and just using the wiki to search - the sort of search I'm after is for all intents and purposes functionally identical to 99% of searches that already exist, I just want to give it a different input data source, and format the output slightly differently (both of which I already know how to do).
Surely there must be some advice on how those sorts of searches are done?