Hi, How would you guys go about creating a "real-time" search engine on .Net platform. Near real-time search of the web is so popular nowadays and I was hoping you guys would help me brainstorm some ideas. I might try to make some prototype eventually, but mostly it is just a "mental training".
The requirements are:
- .NET platform, IIS, MS SQL server or Lucene.Net (file-system)
- input data to be indexed are only keywords plus some meta information - no further processing required
- data are grouped by keywords and ordered by number of occurrences of the keywords
- no historic data are kept (data older than some fixed amount of time are discarded or moved to some other data store)
Not knowing much about the subject matter, this is what I've come up with so far:
Data are fed to the system through web service. Since data are already in form of keywords, no further processing is performed. WS saves data to db. Select query is performed in fixed time intervals to return data (for example: we query incoming data for past hour and perform the query every second). Grouping and sorting is performed in memory to offload the sql server. Old data in db are discarded every couple minutes. I'm not sure how would sql server handle that if there were many new rows added constantly. Grouped and sorted data are then displayed.
I'm sure you guys have more experience and better ideas for this kind of thing.
Regards,
Ondrej