views:

76

answers:

3

We took over a website from another company after a client decided to switch.

We have a table that grows by about 25k records a day, and is currently at 15MM records.

The table looks something like:

id (PK, int, not null)
member_id (int, not null)
another_id (int, not null)
date (datetime, not null)

SELECT COUNT(id) FROM tbl can take up to 15 seconds.

A simple inner join on 'another_id' takes over 30 seconds.

I can't imagine why this is taking so long. Any advice?

SQL Server 2005 Express

+1  A: 

Do note that COUNT(id) will usually result in a full table scan, so it has to read the entire table to get the count. If counting is really a very important thing for you, you might want to consider creating a trigger to store the results of the count in some other table.

Without the query I can't say much about the inner join, but my guess would be that you don't have an index on either id or another_id

WoLpH
or the selectivity of another_id is very poor
Chris Bednarski
you were right. they were unindexed.
john
depending on the other work in the transaction when inserting the 25k rows (are they from users doing work or bulk insert?), that update of the same row every time in the trigger will most likely slow down your system with locking and blocking.
KM
@KM: That's only true if you don't defer the updates. But in any case... 25k rows a day is next to nothing for any database. With a table as simple as this that would be executed within a second.
WoLpH
KM
As far as I know every update that a trigger does is deferred by default in MSSQL. So locking won't be a very large problem. Also, you will be updating a column in a table with very few (or if this is the only count you need, 1) row so updating will be _very_ fast.But if you are afraid that locking is a problem you can also just append a number to a table and regularly combine all those appends to 1 single number again. That way locking won't be a problem at all.
WoLpH
@WoLpH said `As far as I know every update that a trigger does is deferred by default in MSSQL` what is this *deferred* that you are referig to? As far as I know, every action within a trigger in MSSQL is a part of the original firing command and bound to it by a transaction. If there are many concurrent transactions all trying to hit the same one row-column there could be some waiting.
KM
I mean that MSSQL would try to execute the triggers as late as possible, right before the actual commit. So if it locks, it will be at the end of the transaction for a very short period. `deferred` might not be the right term for MSSQL though, that's more of a Oracle and/or Postgres thing. When using transactions for replication you would normally wait as long as possible with sending the changes to the other server. The principle is the same :)
WoLpH
so you're saying that if I open a transaction, do a update that fires a trigger, then select the data affected by the trigger, it won't be done, because it is waiting for `right before the actual commit`? that sounds very crazy. I'd like to see a link to the documentation explaining that. It is my experience that a trigger fires and executes adjacent to the command that calls it, as they are bound with an automatic transaction, and once the trigger is complete, the next command after the trigger causing command is then run.
KM
The Postgres manual has some clear documentation about this, see the part about timing: http://www.postgresql.org/docs/8.4/static/sql-createconstraint.htmlFor Oracle you can use `DBMS_Change_notification` to make it asynchronous: http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_chngnt.htmAnd for MS SQL... I can't find any documentation of this feature specifically but you can easily work around it with `NOLOCK`: http://msdn.microsoft.com/en-us/library/ms177523.aspxEither way, you can get this fast in any decent database with some tuning.
WoLpH
question is tagged sql-server
KM
I'm not perfect ;) I work with a lot of different databases and sometimes I get the features mixed up. Mea culpa :)
WoLpH
+2  A: 

If you need a row count, but you can live with an approximate value (not 100% guaranteed to be exact - but gives you a ballpark), you could use the system catalog views in SQL Server to achieve this like so:

SELECT 
    t.Name AS TableName,
    sum(p.rows) as RowCounts
FROM 
    sys.tables t
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
WHERE 
    t.Name = 'Your Table Name'
    i.OBJECT_ID > 255 AND   
    i.index_id <= 1
GROUP BY 
    t.NAME, i.object_id, i.index_id, i.name 

Given any table in your database, this will give you an approximation of the number of rows in that table, and it's very very fast (not measurable - less than 0.01 sec.)

marc_s
count(*) is not the only issue, OP says `A simple inner join on 'another_id' takes over 30 seconds.`
KM
@KM: yes, I know - I was just offering up a solution for one of the problems mentioned - the other has been addressed by indexing, I believe
marc_s
+1If you simply need a ballpark number of all rows than this is a very good solution.
WoLpH
A: 

15 million rows running on SQL Server 2005 Express

what kind of hardware are you using? your database can be set up properly, but if you're not on a good dedicated server with lots of ram, queries on a big table like that will be slow.

KM
Big table? Not that much imho... I mean, 4 columns, 3 ints which would mean either 32bit or 64bit numbers and a datetime, also 32bit or 64bit. So... that is `32 * 4 * 15 000 000 bits = 229 megabytes` of memory or double of that in the case of 64 bit integers/datetimes. Even a simple desktop computer could easily handle that, especially since for a `COUNT(id)` you only need 1/4th of that in memory.
WoLpH
(4+4+4+8) bytes * 15000000 rows = 286 megabytes and that is the data only no other overhead, not to mention any indexes or other tables. By just adding one index to this table on one of those ints (for a fk possibly) will add another 57+ megabytes in data alone 4*15m, and twice that if it is the datetime, and Sql Server Express is limited to 1 gig of ram: microsoft.com/Sqlserver/2005/en/us/… if this is not on a server, IO will be slower, and if other applications are running (IIS, exchange, etc) things will run slower
KM