views:

52

answers:

3

I am faced with the choice where to store some reference data (essentially drop down values) for my application. This data will not change (or if it does, I am fine with needing to restart the application), and will be frequently accessed as part of an AJAX autocomplete widget (so there may be several queries against this data by one user filling out one field).

Suppose each record looks something like this:

category
effective_date
expiration_date
field_A
field_B
field_C
field_D

The autocomplete query will need to check the input string against 4 fields in each record and discrete parameters against the category and effective/expiration dates, so if this were a SQL query, it would have a where clause that looks something like:

... WHERE category = ? 
AND effective_date < ?
AND expiration_date > ? 
AND (colA LIKE ? OR colB LIKE ? OR colC LIKE ?)

I feel like this might be a rather inefficient query, but I suppose I don't know enough about how databases optimize their indexes, etc. I do know that a lot of really smart people work really hard to make database engines really fast at this exact type of thing.

The alternative I see is to store it in my application memory. I could have a list of these records for each category, and then iterate over each record in the category to see if the filter criteria is met. This is definitely O(n), since I need to examine every record in the category.

Has anyone faced a similar choice? Do you have any insight to offer?


EDIT: Thanks for the insight, folks. Sending the entire data set down to the client is not really an option, since the data set is so large (several MB).

A: 

I do not think there is a one size fits all answer to your question. Depending on the data size and usage patterns the answer will vary. More than that the answer may change over time.

This is why in my development I built some intermediate layer which allows me to change how the caching is done by changing configuration (with no code changes). Every while we analyze various stats (cache hit ratio, etc.) and decide if we want to change cache behavior.

BTW there is also a third layer - you can push your static data to the browser and cache it there too

mfeingold
I am definitely planning to have an intermediate layer so we can switch mechanisms down the road. I am really just trying to explore the problem at this point. There is really too much data to send it all down to the client, to do filtering there.
pkaeding
+1  A: 

Definitely cache it in memory if it's not changing during the lifetime of the application. You're right, you don't want to be going back to the database for each call, because it's completely unnecessary.

There's can be debate about exactly how much to cache on the server (I tend to cache as little as possible until I really need to), but for information that will not change and will be accessed repeatedly, you should almost always cache that in the Application object.

Given the number of directions you're coming at this data (filtering on 6 or more columns), I'm not sure how much more you'll be able to optimize the information in memory. The first thing I would try is to store it in a list in the Application object, and query it using LINQ-to-objects. Or, if there is one field that is used significantly more than the others, or try using a Dictionary instead of a list. If the performance continues to be a problem, try using storing it in a DataSet and setting indexes on it (but of course you loose some code-simplicity and maintainability this way).

Mike Mooney
A: 

Can you just hard-wire it into the program (as long as you stick to DRY)? Changing it only requires a rebuild.

Mike Dunlavey
The only problem with that is the data needs to be able to be changed outside of the system (when it is changed). So the canonical data needs to be in the database, and the application can read it all in during boot, or perhaps on a nightly timer, or something.
pkaeding
@pkaeding: That's a good reason.
Mike Dunlavey