views:

641

answers:

6

What is a good design for caching the results of an expensive search in an ASP.NET system?

Any ideas would be welcomed ... particularly those that don't require inventing a complex infrastructure of our own.

Here are some general requirements related to the problem:

  • Each search result can produce include from zero to several hundred result records
  • Each search is relatively expensive and timeconsuming to execute (5-15 seconds at the database)
  • Results must be paginated before being displayed at the client to avoid information overload for the user
  • Users expect to be able to sort, filter, and search within the results returned
  • Users expect to be able to quickly switch between pages in the search results
  • Users expect to be able to select multiple items (via checkbox) on any number of pages
  • Users expect relatively snappy performance once a search has finished

I see some possible options for where and how to implement caching:

1. Cache on the server (in session or App cache), use postbacks or Ajax panels to facilitate efficient pagination, sorting, filtering, and searching.

  • PROS: Easy to implement, decent support from ASP.NET infrastructure
  • CONS: Very chatty, memory intensive on server, data may be cached longer than necessary; prohibits load balancing practices

2. Cache at the server (as above) but using serializeable structures that are moved out of memory after some period of time to reduce memory pressure on the server

  • PROS: Efficient use of server memory; ability to scale out using load balancing;
  • CONS: Limited support from .NET infrastructure; potentially fragile when data structures change; places additional load on the database; significantly more complicated

3. Cache on the client (using JSON or XML serialization), use client-side Javascript to paginate, sort, filter, and select results.

  • PROS: User experience can approach "rich client" levels; most browsers can handle JSON/XML natively - decent libraries exist for manipulation (e.g. jQuery)
  • CONS: Initial request may take a long time to download; significant memory footprint on client machines; will require hand-crafted Javascript at some level to implement

4. Cache on the client using a compressed/encoded representation of the data - call back into server to decode when switching pages, sorting, filtering, and searching.

  • PROS: Minimized memory impact on server; allows state to live as long as client needs it; slightly improved memory usage on client over JSON/XML
  • CONS: Large data sets moving back and forth between client/server; slower performance (due to network I/O) as compared with pure client-side caching using JSON/XML; much more complicated to implement - limited support from .NET/browser

5. Some alternative caching scheme I haven't considered...

+11  A: 

For #1, have you considered using a state server (even SQL server) or a shared cache mechanism? There are plenty of good ones to choose from, and Velocity is getting very mature - will probably RTM soon. A cache invalidation scheme that is based on whether the user creates a new search, hits any other page besides search pagination, and finally a standard timeout (20 minutes) should be pretty successful at weeding your cache down to a minimal size.

References:

Rex M
SharedCache has worked well for us, and performance wise has been beating Velocity. You may want to add Memcached as well to your list, but you will have to write your own library to access it via .NET
BozoJoe
+1  A: 

Since you say any ideas are welcome:

We have been using the enterprise library caching fairly successfully for caching result sets from a LINQ result.

http://msdn.microsoft.com/en-us/library/cc467894.aspx

It supports custom cache expiration, so should support most of your needs (with a little bit of custom code) there. It also has quite a few backing stores including encrypted backing stores if privacy of searches is important.

It's pretty fully featured.

My recommendation is a combination of #1 and #3:

  1. Cache the query results on the server.
  2. Make the results available as both a full page and as a JSON view.
  3. Cache each page retrieved dynamically at the client, but send a REQUEST each time the page changes.
  4. Use ETAGs to do client cache invalidation.
John Gietzen
A: 

Have a look at SharedCache- it makes 1/2 pretty easy and works fine in a load balanced system. Free, open source, and we've been using it for about a year with no issues.

nitzmahone
+3  A: 

Raising an idea under the "alternative" caching scheme. This doesn't answer your question with a given cache architecture, but rather goes back to your original requirements of your search application.

Even if/when you implement your own cache, it's effectiveness can be less than optimal -- especially as your search index grows in size. Cache hit rates will decrease as your index grows. At a certain inflection point, your search may actually slow down due to resources dedicated to both searching and caching.

Most search sub-systems implement their own internal caching architecture as a means of efficiency in operation. Solr, an open-source search system built on Lucene, maintains its own internal cache to provide for speedy operation. There are other search systems that would work for you, and they take similar strategies to results caching.

I would recommend you consider a separate search architecture if your search index warrants it, as caching in a free-text keyword search basis is a complex operation to effectively implement.

jro
+4  A: 

If you are able to wait until March 2010, .NET 4.0 comes with a new System.Caching.CacheProvider, which promises lots of implementations (disk, memory, SQL Server/Velocity as mentioned).

There's a good slideshow of the technology here. However it is a little bit of "roll your own" or a lot of it infact. But there will probably be a lot of closed and open source providers being written for the Provider model when the framework is released.

For the six points you state, a few questions crops up

  • What is contained in the search results? Just string data or masses of metadata associated with each result?
  • How big is the set you're searching?

How much memory would you use storing the entire set in RAM? Or atleast having a cache of the most popular 10 to 100 search terms. Also being smart and caching related searches after the first search might be another idea.

5-15 seconds for a result is a long time to wait for a search so I'm assuming it's something akin to an expedia.com search where multiple sources are being queried and lots of information returned.

From my limited experience, the biggest problem with the client-side only caching approach is Internet Explorer 6 or 7. Server only and HTML is my preference with the entire result set in the cache for paging, expiring it after some sensible time period. But you might've tried this already and seen the server's memory getting eaten.

Chris S
A: 

While pondering your options, consider that no user wants to page through data. We force that on them as an artifact of trying to build applications on top of browsers in HTML, which inherently do not scale well. We have invented all sorts of hackery to fake application state on top of this, but it is essentially a broken model.

So, please consider implementing this as an actual rich client in Silverlight or Flash. You will not beat that user experience, and it is simple to cache data much larger than is practical in a regular web page. Depending on the expected user behavior, your overall bandwidth could be optimized because the round trips to the server will get only a tight data set instead of any ASP.NET overhead.

Jerry Bullard