views:

155

answers:

2

I imported the Stack Overflow data dump into SQL Server 2008. Some queries, especially on the Posts table, are taking more than a minute to return.

Example query:

SELECT   
  Id, PostTypeId, AcceptedAnswerId, CreationDate, 
  Score, ViewCount, Body, OwnerUserId, OwnerDisplayName,
  LastEditorUserId, LastEditDate, LastActivityDate, Title,
  Tags, AnswerCount, CommentCount, FavoriteCount, ClosedDate, ParentId
FROM dbo.Posts

The query returns 881665 Rows and takes just under 2 minutes to complete. I do have in index set up for this and the other tables. Is there anything I can do to speed this thing up?

+3  A: 

Because you don't have a WHERE clause, you're performing a table scan, which reads the entire table. This will always be relatively slow; an index won't help at all.

To speed up the query, select less :) Try putting in a WHERE clause so you're only interested in certain tags, or questions over a certain date period. Then you can put an index on those columns to speed up the query.

Jeremy Smyth
800K rows isn't that hard to scan; I bet it's the *transfer* of 800K rows that's taking all the time. Try grepping an 800K line file, it's fast even without indexes.
SquareCog
+1  A: 

If you're doing copies have a look at the SqlBulkCopy API. I have had an insertion go from 10 minutes to 4 seconds using that API.

But Jeremy is perfectly correct. What do you expect running a query over 800000+ results which contain strings (the body column). If you don't need the body then you could probably speed up the result significantly.

What hardware (specifically HDD) are you using for the SQL Server. If you shove a DB like that onto your C:\ then you aren't going to have the desired result.

Also do you have full text catalogs enabled? If you are searching in the post text then this indexing will significantly improve your speed.

Spence