views:

59

answers:

6

I have a SQL table it has more than 1000000 rows, and I need to select with the query as you can see below:

   SELECT DISTINCT TOP (200) COUNT(1) AS COUNT,  KEYWORD
   FROM QUERIES WITH(NOLOCK)
   WHERE KEYWORD LIKE  '%Something%'
   GROUP BY KEYWORD ORDER BY 'COUNT' DESC

Could you please tell me how can I optimize it to speed up the execution process? Thank you for useful answers.

+1  A: 

I'd first look at the execution plan to see how sql server is trying to access your data. Here is a link to just one of many articles on execution plan analysis.

Jeremy
A: 

As Jeremy stated, you need to look at the execution plan and client statistics to see what is faster. However, a couple of suggestions. First, do you really need a prefixing wildcard on your search? I.e., LIKE '%Something%' will not be able to use an index whereas LIKE 'Something%' will. Second, you might try a CTE to see if will be faster. So, something like:

;With NumberedItems As
    (
    Select Keyword, Count(*) As [Count]
        , ROW_NUMBER() OVER ( ORDER BY Keyword, Count(*) DESC ) As ItemRank
    From Queries WITH (NOLOCK)
    Where Keyword LIKE '%Something%'
    Group By Keyword
    )
Select Keyword, [Count]
From NumberedItems
Where ItemRank <= 200
Thomas
A: 

It's rather hard to guess what may be causing the performance issues with just a query and no schema or execution plan. You should definitely read-up on them as all performance tuning of SQL queries is ultimately driven by the execution plan.

If you really want to delve into it, you can also read up on the query optimizer which attempts to execute your query using the most optimal plan. Understanding the optimizer is important to ensure you are taking full advantage of the indexes, etc. you have on the database. Microsoft also has several helpful documents such as this on troubleshooting performance issue.

For your particular case, the bottleneck is most likely in the WHERE clause. LIKE comparisons tend to be inefficient, especially when surrounded by percent signs as the query tends to be unable to take advantage of indexes on the column, etc. Depending on how you've stored data, full-text indexing may be a useful option, as that can frequently outperform LIKE '%SOMEVALUE%'.

ig0774
A: 

Asking a question about SQL Server performance without providing a schema is a complete waste of everybody time. I'm going to answer a different question, which is one you should had been ask in the first place:

What schema should I use to efficiently satisfy a query like SELECT DISTINCT TOP (200) COUNT(1) AS COUNT, KEYWORD FROM QUERIES WHERE KEYWORD LIKE '%Something%'GROUP BY KEYWORD ORDER BY 'COUNT' DESC when QUERIES table has over 1M rows?

The proper schema depend on the selectivity of KEYWORD. One possible design would be to normalize KEYWORD into a lookup table and have a narrow non-clustered index on the lookup id:

CREATE TABLE KEYWORDS (KeywordId INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
  Keyword VARCHAR(...) UNIQUE);
CREATE TABLE QUERIES (...,
  KeywordId INT NOT NULL,
  CONSTRAINT FK_KEYWORD 
   FOREIGN KEY KeywordId
   REFERENCES KEYWORDS (KeywordId),
  ...);
CREATE INDEX ndxQueriesKeyword ON Queries (KeywordId);

If the number of distinct keyword is relatively low, the original query can be satisfied quickly by a scan of the Keywqord table followed by a nexted loop range scan of the ndxQueriesKeyword index, which is very narrow and therefore generates low IO.

As the number of distinct keyword increases, this approach may start showing problems due to the high number of range scans on the Queries table, and possible even due to the full scan on the Keywords table.

You may consider using a different WHERE clause, namely one LIKE 'Something%, which is SARGable and can leverage an index on KEYWORK, benefiting from a range reduction and a narrower scan than a full table scan.

If you are on Enterprise Edition you can consider adding an indexes view with the pre-computed aggregates:

CREATE VIEW vwQueryKeywords 
WITH SCHEMABINDING
AS SELECT KEYWORD, COUNT_BIG(*) as COUNT
FROM dbo.QUERIES
GROUP BY KEYWORD;

CREATE CLUSTERED INDEX cdxQueryKeywords ON vwQueryKeywords(KEYWORD);

On EE the optimizer will consider the indexed view for the original query. On non-EE you will have to change the query to run against the view with the NOEXPAND hint:

SELECT KEYWORD, COUNT
FROM vwQueryKeywords WITH(NOEXPAND)
WHERE KEYWORD LIKE '%Something%';

Another completely different approach is to ditch the LIKE '%Something%' condition altogether in favor of fullt-text search:

SELECT DISTINCT TOP (200) COUNT(1) AS
 COUNT,  KEYWORD FROM QUERIES WHERE
 CONTAINS (Keyword, Something)
 GROUP BY
 KEYWORD ORDER BY 'COUNT' DESC

Because the FT search is a reverse-index lookup, it may prove optimal over a traditional WHERE. The only issue is that you'll only be able to search for full words, since FT won't let you search partial matches the way LIKE does. Again, the actual mileage will vary based on Keyword data profile (ie. its statistics and distribution).

Remus Rusanu
Btw, if u downvote, explain why
Remus Rusanu
A: 

Your query is not optimizable (without implementing some form of full-text indexing, itself expensive) because you have a leading wildcard in your keyword match. You would need to split the keywords out into separate column values (probably in a separate, related table) and search on an exact match or, at least, a match with the wildcard not at the beginning of the text.

Additionally the results you're getting may not be accurate if you have some keywords that are nested in others (eg "cart" will match a keyword search on "car", which is not what you want).

Larry Lustig
A: 

If you can't use a full-text search engine from a third party, create an inverted index from your text periodically and search that instead. A naive implementation would beat your current strategy.

http://en.wikipedia.org/wiki/Inverted_index

Sam Goldman