views:

37

answers:

3

I'm working on a personal project (Search engine) and have a bit of a dilemma. At the moment it is optimized for writing data to the search index and significantly slow for search queries.

The DTA (Database Engine Tuning Adviser) recommends adding a couple of Indexed views inorder to speed up search queries. But this is to the detriment of writing new data to the DB.

It seems I can't have one without the other!

This is obviously not a new problem. What is a good strategy for this issue?

A: 

What's your scenario - OLAP or OLTP ("Search engine" sounds like querying is more frequent than writing new data...)?

I had a similar situation where I added a load of indexes on the basis of DTA recommendations only to find that my ETL processes slowed down to a standstill, due to slow downs on writes. There wasn't any rule-of-thumb I could follow other than trying different things out and finding the balance that best fit my situation.

davek
A: 

Always optimising for querying.

Even "write intensive" is no more then 15% write (I read somewhere). For example:

  • UPDATE..WHERE is a select/search because of the WHERE
  • INSERT with a unique constraint needs a check of the constrainst for duplicates
  • DELETE requires a check of all child tables FK columns

My back of the envelope estimates for our OLTP systems is minimum 95% read (5 million+ inserts per day system) to over 98% for others

gbn
The database is already optimized for the writing operations (including where constraints etc used by the writing program)
Harry
A constraint will slow things down. I'm trying to say that unless you go utterly mad, and you have a very non-normal system, add the views...
gbn
A: 

A common strategy, not applicable/practical in all cases, is the
"input gateway" approach
With this approach, all the [real-time] inserts are done into one table (or a few tables), and the serving the [search] application is supported from another set of tables (with many indexes and other search-oriented optimizations). At fixed (or variable / based on load) intervals, rows from the input table(s) are transferred to the application tables, and deleted from the input table(s), as so to keep this input gateway lean and mean an without much need for indexes.

The main drawback of this approach is of course that the application data lags behind in terms of real-time updates. This situation can be addressed in several ways, typically by either increasing the frequency of transfers, or by having the application run two searches / UNION-type searches (the search in the import "heaps" is typically fast enough, even with no / few indexes, owing to the smaller size)

mjv
interesting idea. This was something that I expected to be suggested (Or something like it)
Harry
Data lag is perfectly fine in this project
Harry