views:

525

answers:

5

At my work we have a small database (as in two hundred tables and maybe a total of a million of rows or so).

I've always expected it to be quite fast in the order of several ten of thousands insertion per second and with querys taking milliseconds once the connection is established.

Quite the contrary we are having some performance problems so that we only get a couple of hundred insertions per second and querys, even the simplest ones would take for ever.

I'm not enterly sure if that's the standar behavior/performance or we're doing something wrong. For example, 1500 querys which imply joining 4 tables on a single key column take around 10 seconds. It takes 3 minutes to load 300K of data in xml format into the database using simple inserts without violating any constraints.

The database is SQL Server 2005 and has a rich relational dependency model, meaning a lot of relations and categorizations over the data as well as a full set of check constraints for the categorization codes and several other things.

Are those times right? If not, what could be affecting performance? (All queries are done on indexed columns)

+1  A: 

A "Rich Relational Dependency" model is not conducive to fast insert speeds. Every constraint (primary key, value checks, and especially foreign keys), must be checked for every inserted record. Thats a lot more work than a "simple insert".

And it doesn't mtter that your inserts have no constraint violations, the time is probably going to be all in checking your foreign keys. Unless you have triggers also, because they're even worse.

Of course is it possible that the only thing that is wrong is that your Insert table is the parent-FK for a must-have-children" FK relation for another table tha forgot to add an index for the child-FK side on the FK relation (this is not automatic and is often forgotten). Of course, that's just hoping to get lucky. :-)

RBarryYoung
+4  A: 

Indexing is a major factor here, when done properly they can speed up Select statements quite well, but remember that an index will bog down an insert as well as the server not only updates the data, but the indexes as well. The trick here is:

1) Determine the queries that are truly speed critical, these queries should have optimal indexes for them.

2) Fill factor is important here as well. This provides empty space to an index page for filling later. When an index page is full (enough rows are inserted), a new page needs to be created taking yet more time. However empty pages occupy disk space.

My trick is this, for each application I set priorities as follows:

1) Speed of read (SELECT, Some UPDATE, Some DELETE) - the higher this priority, the more indexes I create
2) Speed of write (INSERT, Some Update, Some DELETE) - the higher this priority, the fewer indexes I create
3) Disk space efficiency - the higher this priority, the higher my fill factor

Note this knowledge generally applies to SQL Server, your mileage may vary on a different DBMS.

SQL Statement evaluation can help here too, but this takes a real pro, careful WHERE and JOIN analysis can help determine bottlenecks and where your queries are suffering. Turn on SHOWPLAN and query plans, evaluate what you see and plan accordingly.

Also look at SQL Server 2008, indexed Joins!

tekiegreg
+1  A: 

Constraints add a small performance penalty. It also has to update indexes for every insert. And if you don't put multiple inserts into a single transaction, the database server has to execute every insert as a new, separate transaction, slowing it down further.

150 queries/second joining 4 tables sounds normal, though I don't know much about your data.

David
+4  A: 

To have a rough comparison: the TPC-C benchmark record for SQL Server is at around 1.2 mil transactions per minute, and is been like this over last 4 years or so (caped by the 64 CPU OS limit). That is something in the balpark of ~16k transactions per second. This is on super high end machines, 64 CPUs, plenty of RAM, affinitized clients per NUMA node and a serverly short stripped I/O system (only about like 1-2% of each spindle is used). Bear in mind those are TPC-C transactions, so they consist of several operations (I think is 4-5 reads and 1-2 writes each in average).

Now you should scale down this top of the line hardware to your actual deployment and will get the ballpark where to set your expectations for overal OLTP transaction processing.

For data upload the current world record is about 1TB in 30 minutes (if is still current...). Several tens of thousands of inserts per second is quite ambitious, but achievable, when properly done on serious hardware. The article in the link contains tips and tricks for ETL high troughput (eg. use multiple upload streams and affinitize them to NUMA nodes).

For your situation I would advise first and foremost measure so you find out the bottlenecks and then ask specific questions how to solve specific botlenecks. A good starting point is the Waits and Queues whitepaper.

Remus Rusanu
A: 

"I've always expected it to be quite fast in the order of several ten of thousands insertion per second and with querys taking milliseconds once the connection is established."

(a) Database performance depends for 99% on the amount of physical I/O (unless you are in some small site using an in-memory database, which can harmlessly afford to postpone all physical I/O until after the day is done). (b) Database I/O involves not only the actual physical I/O to the data files, but also the physical I/O to persist the journals/logs/... (and journaling is often even done in dual mode (i.e. twice) since say about two decades or so). (c) In what way the "amount of inserts" corresponds to the "amount of physical I/O", is completely determined by how much options the database designer has available for optimising the physical design. Only one thing can be said in general about this : SQL systems mostly fail (to provide the options necessary to transform "tens of thousands of inserts" to just maybe "a couple of hundreds" of physical I/O). Meaning that "tens of thousands of inserts" usually also implies "thousands of physical I/O", which usually implies "tens of seconds".

That said, your message seems to express an expectation that somehow "inserts are extremely fast ("tens of thousands per second")" while "queries are slower" ("milliseconds per query", implying "less than 1000 queries per second"). That expectation is absurd.

The expectation was due to the fact that the queries I'm using are quite more complex that the inserts.
Jorge Córdoba