tags:

views:

149

answers:

6

I am doing a query from an SQL database. The table has over a million records in it.

This is my SQL statement. It takes over 8 hours to run. Anyone have any ideas? Thanks in advance.

Select 
  Count (userID) as DIBWIZHits, Sum (ssnCount) as SSNs
From 
  tbl_hits10
Where 
 (appName='DIBwiz QMT' or 
  appName like 'DIBwiz-Full%' or 
  appName like 'DIBwiz-Abb%' or 
  appName like 'DIBwiz-Qual%')
  -- or appName like 'DIBwiz%Open%' or appName like 'DIBwiz%Q%')
and 
  lu_date between 
    convert (datetime, '2010-09-01 00:00:00', 102) and 
    convert (datetime, '2010-09-30 23:59:59', 102)
AND 
 (userID<>'888888' and 
  userID<>'999999' and 
  userID<>'777777' and 
  userID<>'666666' and 
  )
A: 

Use IN and NOT IN

Detect
A: 

Do you have indexes on your table? If not you should at least add an index to the userID field, don't add an index to the appName field as you are using 'like' it will have no effect.

Given that you have so many records, if you do not already have said index on userID it may also take some time to add the index also but should improve performance a lot.

murdoch
Your comment about `LIKE` not benefiting from an index is false, at least in SQL Server 2005. How Anthony uses `LIKE` (i.e., wildcard at end), he will benefit. The following post illustrates this - it's easy enough to try for yourself in mgmt studio express (free). http://myitforum.com/cs2/blogs/jnelson/archive/2007/11/16/108354.aspx
Matt
+4  A: 

I have several pieces of advice for optimization.

1) You absolutely should be using regular expressions to search for the appname.

2) You should be comparing the userID to a list, such as "AND (userID not in (1, 2, 3, 4, 5))

3) Your database table should have indexes.

Each of those should dramatically improve the performance.

mvrak
If this is SQL Server (a reasonable assumption given the `convert` function), then: 1. Definitely not, that would be much slower. 2. That would not make any difference. 3. Yes, this should be the first thing to check.
Christian Hayter
+1  A: 

Having the CONVERT(datetime, ...) functions in there where clause will perform that parsing for every row. You would be better off declaring a datetime variable set to the result of the CONVERT and using that. Additionally, using "IN" and "NOT IN" is better than a slew of <>. Finally, LIKE operators with wildcards are generally slower than exact operators.

jelbourn
The CONVERT will be performed exactly once when building the query plan since it's using constant values. (NOT) IN generates the same plan as a series of AND clauses. LIKE will always be trivially slower simply because it has to do more work, but the real pain comes from having a wildcard at the beginning that prevents the use of an index, which is not the case in this query. (Although I suppose appName could be a huge TEXT column which would bump the cost of his 'stuff%morestuff%' conditions somewhat.)
zinglon
A: 

Do you have an index on appName?

also consider @Detect's suggestion to use userid not in (< coma_separated_values >)

sebastian
+1  A: 

Well, that could be highly platform specific, but I would try to manually break it down in nested queries depending on what indexes are present. E.g. (simplified), assuming there's an index on lu_date:

Select Count (userID) as DIBWIZHits, Sum (ssnCount) as SSNs
From 
  (select * from tbl_hits10 
     where lu_date between 
       convert (datetime, '2010-09-01 00:00:00',     102) 
       and convert (datetime, '2010-09-30 23:59:59', 102)
  ) z
Where (appName like 'DIBwiz%' )
AND userID not in ('016266'....)

IHTH

500 - Internal Server Error