tags:

views:

94

answers:

3

To begin with, I LOVE LINQ TO SQL. It's so much easier to use than direct querying.

But, there's one great problem: it doesn't work well on high loaded requests. I have some actions in my ASP.NET MVC project, that are called hundreds times every minute.

I used to have LINQ to SQL there, but since the amount of requests is gigantic, LINQ TO SQL almost always returned "Row not found or changed" or "X of X updates failed". And it's understandable. For instance, I have to increase some value by one with every request.

var stat = DB.Stats.First();
stat.Visits++;
// ....
DB.SubmitChanges();

But while ASP.NET was working on those //... instructions, the stats.Visits value stored in the table got changed.

I found a solution, I created a stored procedure

UPDATE Stats SET Visits=Visits+1

It works well.

Unfortunately now I'm getting more and more moments like that. And it sucks to create stored procedures for all cases.

So my question is, how to solve this problem? Are there any alternatives that can work here?

I hear that Stackoverflow works with LINQ to SQL. And it's more loaded than my site.

+5  A: 

This isn't exactly a problem with Linq to SQL, per se, it's an expected result with optimistic concurrency, which Linq to SQL uses by default.

Optimistic concurrency means that when you update a record, you check the current version in the database against the copy that was originally retrieved before making any offline updates; if they don't match, report a concurrency violation ("row not found or changed").

There's a more detailed explanation of this here. There's also a fairly sizable guide on handling concurrency errors. Typically the solution involves simply catching ChangeConflictException and picking a resolution, such as:

try
{
    // Make changes
    db.SubmitChanges();
}
catch (ChangeConflictException)
{
    foreach (var conflict in db.ChangeConflicts)
    {
        conflict.Resolve(RefreshMode.KeepCurrentValues);
    }
}

The above version will overwrite whatever is in the database with the current values, regardless of what other changes were made. For other possibilities, see the RefreshMode enumeration.

Your other option is to disable optimistic concurrency entirely for fields that you expect might be updated. You do this by setting the UpdateCheck option to UpdateCheck.Never. This has to be done at the field level; you can't do it at the entity level or globally at the context level.

Maybe I should also mention that you haven't picked a very good design for the specific problem you're trying to solve. Incrementing a "counter" by repeatedly updating a single column of a single row is not a very good/appropriate use of a relational database. What you should be doing is actually maintaining a history table - such as Visits - and if you really need to denormalize the count, implement that with a trigger in the database itself. Trying to implement a site counter at the application level without any data to back it up is just asking for trouble.

Use your application to put actual data in your database, and let the database handle aggregates - that's one of the things databases are good at.

Aaronaught
But this solution does not work for this particular problem of counting visits: no matter what resolution strategy you adopt, you will end up losing (i.e. not counting) some visits.
Fyodor Soikin
@Fyodor: Quite the contrary, did you read the last two paragraphs? If you have the application append data to a transactional table then you can never lose any updates because you are never updating, only inserting.
Aaronaught
The last two paragraphs were added after I fetched the initial version of the answer, so I had no chance to read them. Now that I've read them, I must object anyway: this is a questionable solution. While it may be acceptable in some cases, the general strategy of keeping full history of your operation is also asking for trouble. The time will inevitably come when you'll have to implement some sort of history archiving and rolling up the aggregates. And that means that you have to think about it in advance, or else you're doomed. And that significantly increases complexity.
Fyodor Soikin
...and it's probably not worth it, given the simple task of counting visitors. (sorry, comment limit)
Fyodor Soikin
@Fyodor: This is nonsense. If the site gets 1 million unique visitors per day, it will be *years* before that even makes a dent in aggregate performance. Archiving is hardly a difficult problem to solve, "rolling up the aggregates" is a 1-line `COUNT(*)` query, and if necessary it can be denormalized with a 5-line trigger. This is Databases 101 here, and it's how hundreds if not thousands of site analytics tools *actually work*.
Aaronaught
+2  A: 

Use a producer/consumer or message queue model for updates that don't absolutely have to happen immediately, particularly status updates. Instead of trying to update the database immediately keep a queue of updates that the asp.net threads can push to and then have a writer process/thread that writes the queue to the database. Since only one thread is writing, there will be much less contention on the relevant tables/roles.

For reads, use caching. For high volume sites even caching data for a few seconds can make a difference.

Sam
+1  A: 

Firstly, you could call DB.SubmitChanges() right after stats.Visits++, and that would greatly reduce the problem.

However, that still is not going to save you from the concurrency violation (that is, simultaneously modifying a piece of data by two concurrent processes). To fight that, you may use the standard mechanism of transactions. With LINQ-to-SQL, you use transactions by instantiating a TransactionScope class, thusly:

using( TransactionScope t = new TransactionScope() )
{
    var stats = DB.Stats.First();
    stats.Visits++;
    DB.SubmitChanges();
}

Update: as Aaronaught correctly pointed out, TransactionScope is not going to help here, actually. Sorry. But read on.

Be careful, though, not to make the body of a transaction too long, as it will block other concurrent processes, and thus, significantly reduce your overall performance.

And that brings me to the next point: your very design is probably flawed.

The core principle in dealing with highly shared data is to design your application in such way that the operations on that data are quick, simple, and semantically clear, and they must be performed one after another, not simultaneously.

The one operation that you're describing - counting visits - is pretty clear and simple, so it should be no problem, once you add the transaction. I must add, however, that while this will be clear, type-safe and otherwise "good", the solution with stored procedure is actually a much preferred one. This is actually exactly the way database applications were being designed in ye olden days. Think about it: why would you need to fetch the counter all the way from the database to your application (potentially over the network!) if there is no business logic involved in processing it. The database server may increment it just as well, without even sending anything back to the application.

Now, as for other operations, that are hidden behind // ..., it seems (by your description) that they're somewhat heavy and long. I can't tell for sure, because I don't see what's there, but if that's the case, you probably want to separate them into smaller and quicker ones, or otherwise rethink your design. I really can't tell anything else with this little information.

Fyodor Soikin
Sorry, but using a `TransactionScope` here doesn't change a thing. You would need `SERIALIZABLE` isolation level or an `UPDLOCK` to actually ensure that one transaction's read happens after the other transaction's update; but even if you could do that using L2S (which you can't), it would completely kill performance.
Aaronaught
Ummm... Yes, you're right that TransactionScope is not going to do the trick. The second statement, though, is beside the point: I do discuss performance and design afterwards.
Fyodor Soikin
Right, or just use a log table, which you'll be glad to have when your customer/employer asks you to back up your number and/or perform some actual analysis on hits. Because it *will* happen, guaranteed.
Aaronaught
So what you're saying is that I need to move simple stuff like increasing stats to stored procedure? (Actually I already did that)
Alex
Yes, that would be ideal. Type safety and all the bells and whistles is sure nice, but you have to always keep in mind what you're trading off for the convenience.
Fyodor Soikin