views:

117

answers:

5

Setup

I have web service that takes its inputs through a REST interface. The REST call does not return any meaningful data, so whatever is passed in to the web service is just recorded in the database and that is it. It is an analytics service which my company is using internally to do some special processing on web requests that are received on their web page. So it is very important the response take as little time to return as possible.

I have pretty much optimized the code down as much as possible, to make the response as fast as possible. However, the time the database stays open still keeps the connection open for longer than I want before a response is sent back to the web client.

The code looks basically like this, by the way it is ASP.NET MVC, using Entity Framework, running on IIS 7, if that matters.

public ActionResult Add(/*..bunch of parameters..*/) {

    using (var db = new Entities()) {
        var log = new Log {
            // populate Log from parameters
        }
        db.AddToLogs(log);
        db.SaveChanges();
    }

    return File(pixelImage, "image/gif");
}

Question

Is there a way to off load the database insert in to another process, so the response to the client is returned almost instantly?

I was thinking about wrapping everything in the using block in another thread, to make the database insert asynchronous, but didn't know if that was the best way to free up the response back to the client.

What would you recommend if you were trying to accomplish this goal?

+1  A: 

I've been working on multi-tier solutions mostly for the last year or so that require this sort of functionality, and that's exactly how I've been doing it.

I have a singleton that takes care of running tasks in the background based on an ITask interface. Then I just register a new ITask with my singleton and pass control from my main thread back to the client.

md5sum
+1  A: 

Create a separate thread that monitors a global, in memory queue. Have your request put it's information on the queue and return, the thread then takes the item off the queue and posts it to the DB.

Under heavy load, if the thread lags the requests, your queue will grow.

Also, if you lose the machine, you will lose any unprocessed queue entries.

Whether these limitations are acceptable to you, you'd need to decide that.

A more formal mechanism is using some actual middleware messaging system (JMS in Java land, dunno the equivalent in .NET, but there's certainly something).

Will Hartung
SQL Service Broker would be the equivalent in the OP's environment--and it provides scalability, queue reliability, guaranteed message ordering and delivery. OTOH I think it's probably overkill for usage analytics data.
Ben M
@Ben: SSB would be needed indeed if the data write is critical (eg. financial transaction), as in http://rusanu.com/2009/08/05/asynchronous-procedure-execution. If the purpose is just writing visit tracking logs, SSB is overkill as the cost of sending an message is actually greater than an insert.
Remus Rusanu
@Remus: depends on the insert, doesn't it? Either way, you're right: it's not the way to go here.
Ben M
A: 

Before I spent a lot of time on the optimization I'd be sure of where the time is going. Connections like these have significant latency overhead (check this out). Just for grins, make your service a NOP and see how it performs.

It seems to me that the 'async-ness' needs to be on the client - it should fire off the call to your service and move on, especially since it doesn't care about the result?

I also suspect that if the NOP performance is good-to-tolerable on your LAN it will be a different story in the wild.

n8wrl
Even if the call is made async on the client, which is not a bad idea, the server should return as quickly as possible--since unless the client explicitly closes the connection after communicating with the server, the async process will hold the channel open until the response comes through. Still not good.
Ben M
@Ben: Very good point. My concern tho is applying complexity before we're sure where the problem really is. The past couple of years have been an eye-opening experience into distributed systems for me and 95% of the time I've found that even getting server processing down to 0 still results in unacceptable client performance.
n8wrl
The call is made in an async nature on the client. Basically the client loads javascript, and an img tag with the src of the above method is added to the DOM, which makes the request and gives me all the data. So the client is not stalled, it is just waiting for the image to download, which ties up one of the very presious browser connections. Plus ties up my server from serving more requests.
Nick Berardi
@n8wrl sorry if the point wasn't clear, this is being used on the public internet to gain specialized insight in to certain plugins the clients have on their browser.
Nick Berardi
A: 

It depends: When you return to the client do you need to be 100% sure that the data is stored in the database?

Take this scenario:

  • Request comes in
  • A thread is started to save to the database
  • Response is sent to the client
  • Server crashes
  • Data was not saved to the database

You also need to check how many milliseconds you save by starting a new thread instead of saving to the database.

The added complexity and maintainence cost is probably too high compared with the savings in response time. And the savings in response time are probably so low that they will not be noticed.

Shiraz Bhaiji
No, the data that is stored is passive data. In that if it is lost in transmitting it is not really a big deal, because this is for statistical analysis. That being said, I also want to loose as little data as possible.
Nick Berardi
+2  A: 

If the request has to be reliable then you need to write it into the database. Eg. if your return means 'I have paid the merchant' then you can't return before you actually commit in the database. If the processing is long then there are database based asynchronous patterns, using a table as a queue or using built-in queuing like Asynchronous procedure execution. But these apply when heavy and lengthy processing is needed, not for a simple log insert.

When you want just to insert a log record (visitor/url tracking stuff) then the simplest solution is to use CLR's thread pools and just queue the work, something like:

...
var log = new Log {// populate Log from parameters}
ThreadPool.QueueUserWorkItem(stateInfo=>{
  var queueLog = stateInfo as Log;
  using (var db = new Entities()) 
  { 
     db.AddToLogs(queuedLog);
     db.SaveChanges(); 
  }
}, log);
...

This is quick and easy and it frees the ASP handler thread to return the response as soon as possible. But it has some drawbacks:

  • If the incomming rate of requests exceeds the thread pool processing rate then the in memory queue will grow until it will trigger an app pool 'recycle', thus loosing all items 'in progress' (as well as warm caches and other goodies).
  • The order of requests is not preserved (may or may not be important)
  • It consumes a CLR pool thread on doing nothing but waiting for a response from the DB

The last concern can be addressed by using a true asynchronous database call, via SqlCommand.BeginExecuteXXX and setting the AsynchronousProcessing on the connection to true. Unfortunately AFAIK EF doesn't yet have true asynchronous execution, so you would have to resort to the SqlClient layer (SqlConnection, SqlCommand). But this solution would not address the first concern, when the rate of page hits is so high that this logging (= writes on every page hit) becomes a critical bottle neck.

If the first concern is real then and no threading and/or producer/consumer wizardry can aleviate it. If you trully have an incomming rate vs. write rate scalability concern ('pending' queue grows in memory) you have to either make the writes faster in the DB layer (faster IO, special log flush IO) and/or you have to aggregate the writes. Instead of logging every requests, just increment in memory counters and write them periodically as aggregates.

Remus Rusanu
I understand that there is going to be a limitation point to what I am doing and there will definitely be a bottle neck. But I hope to have that bottle neck as far out in the future as I can possible afford. Because the next solution after this is going to be a cloud based one. And I am hoping to delay that point long enough for Azure to become a possibility for production code.
Nick Berardi