ansaurus

Question

SQL Server 2005 slows down as I keep adding rows

Answer 1

+5 A:

with 8 columns, all are NVARCHAR(Max)

There's your first problem. Databases work best if you tell them what type of data you have, and choose the smallest datatype that works for your data. NVARCHAR(Max) is about the most inefficient choice you could have made.

I don't have [a primary key] (nor do I know how to set one up).

And that's your second problem. On every insert you are checking to see if you already have inserted a row that has the same values as another row for certain columns. Because you haven't told the database to index those columns, it has to check the entire table each time, so your query gets slower and slower as the table grows. To add a primary key to an existing table you can use this:

ALTER TABLE table1 ADD CONSTRAINT pk_table1 PRIMARY KEY (Ticker, DateStamp)

See here for more info.

Mark Byers 2009-12-31 02:20:19

Fixing the datatypes alone just resulted in a MASSIVE improvement! I will get the primary key implemented next! I really appreciate everyone's help, and all the answers have helped me - I wish I could accept them all!

Bill Sambrone 2009-12-31 02:37:37

Answer 2

+1 A:

You have at least two problems:

Your table probably lacks appropriate indexes; and
You may be running inside a transaction.

You should have an index on (Ticker,DateStamp) and the check will be much faster. That being said, I wouldn't even do that check. If the INSERT fails, it fails. No biggie.

If you're running inside a transaction and not committing or doing save points then the temporary storage will get huge as you insert rows (because the database needs to be able to rollback any changes). Commit every 1000 rows or so. Either that or don't run inside a transaction.

Now, the next issue is how you're constructing the INSERT statement. You don't want to use string concatenation with parameters. It's a bad practice to get into (in Web applications this is a huge cause of SQL injection vulnerabilities). Take a look at Insert command with parameters and use something like:

INSERT INTO DailyPrice
(Ticker,OpenPrice,ClosePrice,HighPrice,LowPrice,AdjustedClose,Volume,DateStamp)
VALUES
(@Ticker,@OpenPrice,@ClosePrice,@HighPrice,@LowPrice,@AdjustedClose,@Volume,@DateStamp)

Lastly, you should add a primary key. You could use (Ticker,DateStammp) for this but I personally favour technical primary keys, meaning a primary key that has no external meaning. An auto-increment integer field is the most common example of this. Adding (Ticker,DateSTamp) as a primary key will add the index I was referring to above. It's the index that really makes the difference.

cletus 2009-12-31 02:21:48

I have to agree *in general* about using technical primary keys, but for this purpose it might just add an extra unneeded overhead. On the other hand, a timestamp doesn't make a particularly good key. I'm actually not completely convinced that a relational database is a good way to store this data at all, since there are no relations and the data is modelling something that is continuous-time-based in real life, not row based. But when you have a hammer... I guess it works.

Mark Byers 2009-12-31 02:38:57

Although my VB app is local only, I appreciate your help with parameters! My ASP.Net program (unrelated to this) is in dire need of it - I will redo with parameters!

Bill Sambrone 2009-12-31 02:40:20

Answer 3

A:

The first thing you should do is to create an index on your table. You're causing a complete table scan every time you want to add a new row (because of the SELECT DateStamp FROM DailyPrice ... WHERE ... statement), looking for some existing value.

Anyway, since you're checking for a record with an specific DateStamp and Ticker value for every insert, you could make DateStamp and Ticker the primary key of your table, that way, the database will index these two columns, and you won't need to do the IF NOT EXISTS part... the insert will fail if there is already a given key in the table. Keep in mind, however, that this will cause an exception in your VB.Net program, so you'll have to handle it.

Another thing you can do is to change the datatypes for the columns, specially the ones that are indexed. You could change Ticker to nvarchar(X) with X being some fixed value (nvarchar(250), for example), if you can estimate the length of the strings to be stored there. Also you could change DateStamp to DateTime.

alexphi 2009-12-31 02:33:58

I had no idea I could use 2 primary keys! Thank you for your help!

Bill Sambrone 2009-12-31 02:39:39

You *can't* have two primary keys. You can have a primary key that uses two columns though.

Mark Byers 2009-12-31 02:41:18

Ah, I misunderstood. In the SQL studio thingy, it showed 2 key symbols (one on each column), and I took that to mean 2 primaries. So this basically boils down to that primary key (the pairing of the 2 columns) has to be unique.

Bill Sambrone 2009-12-31 02:46:39

Answer 4

A:

In terms of database problems:

Instead of checking for the existence of a record that matches DateStamp and Ticker, you could add a unique index on those columns. That would force the RDBMS to check for an existing record before inserting anything, and it will probably be a lot faster than your current version, because it will use an index and is optimized for this.
Use some real data types and not NVARCHAR(MAX). Find out the appropriate data type for every column and reduce the max length of varchar columns to about what you actually expect to be in the database. This should make SELECTs and the built-in index management work faster.
The code could be a little bit faster if you used a parameterized query and simply changed the parameter value. That way, you would not only get rid of that ugly string concatenation, but you would also not get in a problem if one of your CSV files contains "weird" data, ie. a ' character which would blow your SQL query.

However, I would also double check your CSV reader code. What if the real problem is not even the database but your application needing too much memory and causing swapping?

I always expected the rate of row addition to be constant over time, but obviously this is not so!

As you already discovered, it is not - and it might even get "worse" if you apply a lot of indexes to that table, since those kinda slow down INSERT statements.

hangy 2009-12-31 02:39:03

Answer 5

A:

In addition to what others have said, you also have a problem in that you are inserting rows one at a time. Far better to use bulk insert or an SSIS package. Never insert (or update or delete) large amounts of data by iterating through the rows. Databases are designed to work wioth sets of data and perform far better when doing so.

HLGEM 2009-12-31 18:17:27

ansaurus

tags:

views:

answers:

SQL Server 2005 slows down as I keep adding rows

related questions