views:

95

answers:

1

Maybe I am mistating the problems and conflating the answer with the questions, but please here me out. I would like to think (communally, with you) about a site that is based on any any of the MVC frameworks(something PHP or ASP.NET MVC, whtever) that would use a search engine (lucene/solr, FAST ESP, whatever) as the back end of the Model. That is to say, there is no database per se in the project. Just a giant index of docuements that are semistructured content.

I am looking to understand - and keep in mind the site is primarily read-only - where I am likely to run into trouble. What are the things that make you think this is a bad idea from the get go. Also, please assume that there will be a robust infrastructure with caching surrounding the search engine - so while perf comments are welcomed, we feel they are not the major problem.

Thanks!

A: 

In general, I'd use a tool like Lucene for searching content, and a database for retrieving it. That doesn't mean that it won't work. It's more a question of why you don't want to use a database. Yes, it can work, and it probably will work (depending on the functional requirements of the site, read on), but that still doesn't make a tool like Lucene the right tool for the job per se.

That being said, it also it does depend on the kind of site however. Is it really a site with just a whole bunch of searchable data and nothing else, or is it something much more than that? If the answer is the first, then good! If it is the latter, there are some issues I can think of:

  • Updates to the data can be troublesome. "Instant updates" are usually a no-go, as Lucene would have to rebuild its index, which is time-consuming. If there aren't many updates to the data that's fine. You can just recreate the index a couple of times per day, or nightly, if that works.
  • Trying to stuff any data in an index which is not really suited to be indexed is usually not a good idea. If the site lets users register on your site, then that user data should really go in a database. It's not impossible to store it in a lucene index, it's just not the right tool for the job. Use the index as a bunch of indexed documents, but don't use it as a database as well.
Razzie
Thanks Razzie - the site is indeed all about retrieving data - like from a catalog that rarely changes. However the structure of each data item could be different - not like there are just a handful of templates - more like shards of arbitrary "intelligently marked up"XHTML or XML - in fact I was looking at something like CouchDB as well. Thanks
Nick