views:

523

answers:

5

I'm developing a web app and currently using sql server 2008 for it. But, I am considering moving to another database (simpledb) for improved performance.

I have a background process that inserts up to 10000 rows every hour into one specific table. That table is also read from to display data in the web application. When the background process runs, the web app is unusable because the db connection times out.

As a result, I'm thinking of moving to amazon's simpledb to improve performance. Is amazon's SimpleDB optimized for this use case? If not, is there another solution that I could use?

A: 

What kind of indexes do you have? Indexes generally make reads faster, but writes much slower.

Joel Coehoorn
I am indexing on the date, primaryid, and another int column. These are the columns that are used in the where clause during the selects.
rksprst
+2  A: 

Under 3 inserts per second is not going to give any DBMS a workout unless the amount of data to be inserted in each insert operation is phenomenal. Likewise, 10 reads per second is unlikely to over-stress any competent DBMS unless there is some complicating factor that you've not mentioned (such as 'the reads are aggregates of aggregates over the entire DBMS which will accumulate billions of records after a period of ... well, 100,000 hours for the first billion records, which is roughly 4,000 days, or roughly 10 years').

Jonathan Leffler
+17  A: 

Your problem is the isolation level you are using. Unless you change it, SQL Server (and many other databases) operate in a mode that selects will block on uncommitted reads. You want to change SQL Server such that it uses MVCC instead (the default for Oracle; MySQL and SQL Server both have it too) and your problem will go away.

From SET TRANSACTION ISOLATION LEVEL (Transact-SQL):

READ COMMITTED

Specifies that statements cannot read data that has been modified but not committed by other transactions. This prevents dirty reads. Data can be changed by other transactions between individual statements within the current transaction, resulting in nonrepeatable reads or phantom data. This option is the SQL Server default.

The behavior of READ COMMITTED depends on the setting of the READ_COMMITTED_SNAPSHOT database option:

  • If READ_COMMITTED_SNAPSHOT is set to OFF (the default), the Database Engine uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. The shared lock type determines when it will be released. Row locks are released before the next row is processed. Page locks are released when the next page is read, and table locks are released when the statement finishes.
  • If READ_COMMITTED_SNAPSHOT is set to ON, the Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions.

When the READ_COMMITTED_SNAPSHOT database option is ON, you can use the READCOMMITTEDLOCK table hint to request shared locking instead of row versioning for individual statements in transactions running at the READ COMMITTED isolation level.

(emphasis added)

Change your database configuration to turn READ_COMMITTED_SNAPSHOT to ON.

Also, try to keep your transactions as short-lived as possible and make sure you are committing the transaction in your background process (that's doing the 10,000 inserts an hour) because if it never commits then selects will block forever (on default settings).

cletus
Short transactions are key to avoiding blocking.
mrdenny
Understanding transactions and deadlocking is absolutely essential to understanding how relational/transactional databases work. For more, see: http://www.rhphost.com/SQL-standard/8277final/LiB0058.html
Eric
Thanks, that seems to solve the issue. Website loads with no problems during the inserts now.
rksprst
+4  A: 

As others have said, the amount of data that you are writing into the database isn't an issue. SQL Server can easily handle much more data than that. Personally I've got tables that take hundreds of thousands to millions of rows per hour without issue, and people are reading the rows all day without any slow down.

  1. You may need to look at doing dirty reads by changing the isolation level of the read statements, or using the WITH (NOLOCK) hint.

  2. You should look at using the bulk upload object in .NET to load your data into the database. Use batches of 1000-5000 depending on the performance that you see during testing. You'll need to play with the number to get the best performance. Bulk inserting data into the table will give you a dramatically better performance than inserting the records row by row. Make sure that you don't do the entire upload in a single transaction. You should do one transaction per batch.

  3. What does the disk IO look like when you are writing into the database.

  4. What recovery model do you have set for the database? FULL recovery on the database will require much more IO than using the SIMPLE recovery mode. Only use FULL recovery if you actually need the point in time restores that come with it.

mrdenny
A: 

In a follow-up to Joel's answer, you may need to look at setting appropriate values for PAD_INDEX and FILLFACTOR on your indexes. If you haven't specified those options, your inserts may be doing a lot of re-paginating on your indexes, which would slow down your write times significantly.

womp