views:

47

answers:

3

I'm not sure I 100% understand what the database does. If I just have some misconception, please point it out.

Let's say I have a function that wants to create 100 new entry in the database with has 100,000 entries.

It seems a lot faster when those 100 entries get create and the commit is made after the last entry is created.

Now, if those 100 entries get created by different users, is there a easy way to commit only after 100 entries are created?

Edit: Should I maybe write some sort of buffer?

+2  A: 

You might be able to hack something like that together, but you really shouldn't, because it wrecks your data integrity, which is the whole point of using transactions.

In your proposed solution, a problem with any insert in the batch would cause all the other (possibly totally valid) inserts from completely different users to fail. Also, users wouldn't be able to see the data they just tried to insert because the system was waiting to do the insert until the batch was full.

P.S. Here's a quick intro to transaction processing.

Hank Gay
+1  A: 

I think you do have a misconception. It sounds like you're looking at the database as something that is only for some sort of "long-term" memory. This is a bad concept; the database is the only memory your application has. Even when this isn't true, it's best to pretend that it is.

To go a little deeper, your application has:

  • scoped memory: variables that you define within view functions, for example. These all get destroyed when flow leaves the function.
  • globals: variables that are defined in the outermost part of your code. It is really important not to use these for any sort of state except perhaps configuration constants. The important thing is that you should rely on any dynamic behavior. Otherwise you will have to battle concurrency and forked processes (depending on server gateway) that aren't aware of each other. Just don't do it.
  • a caching scheme, if you choose to implement one. This is entirely optional in django, and there are many ways to do it. However, one typically uses some scheme to ensure that even if the cache crashes, the database reflects the current state of the data accurately.
  • your local filesystem. From a design point of view, most ways of taking advantage of this will either resemble a caching system (above) or be clumsy and fragile. From a performance point of view, it might be about as slow as a database.
  • your database.

So you see that there's not much place for you to put your data besides the database.

David Berger
+1  A: 

Databases are optimized for set-based operations, so yes it wouldbe faster to insert 100 records in a set than one at a time. However, when you are talking about users entering records one ata atime, you would not want to group them together under any circumstances that I can think of. Why?

First, if there was one bad record, the others would fail. This would make for 99 cranky users out of 100 (actually 100, but one would not really have reason to be cranky becasue he did the bad data entry to begin with). Second, users would not see the records immediately after being entered. It is also true that they would not be able to do something further with those records until they are entered such as enter data into related tables. Having a delay like this would make users cranky. If users are entering data from customers through a phone call, they will be especially cranky at the wait (I worked at a call center with a horribly slow commercial product and believe me I know how upset the users used to get!) Third, users will have gone on to something else and would not realize that their data was rejected for bad information, not a good thing at all. How long are you going to wait to get your set number of records? 5 seconds, ten minutes? What happens if for some reason the netwrok connection is lost during that time, wouldn;t the users lose the data they entered.

HLGEM