views:

102

answers:

3

When I entered my current (employer's) company a new database schema was designed and will be the base of a lot of the future tools that are/will be created. With my limited SQL knowledge I think the table is rather well designed. My only concern is that almost every table has a multy-part primary key. Every table has at least a CustomerId and key of it's own. While these are indeed defining for a certain record, I have the feeling that multiple keys (we're talking quadruple here) are very inefficient.

Today I was seeing some unimaginable CPU usage over a simple, repeated query that joins two tables, selects a single string field from the first and distincts them.

select distinct(f.FIELDNAME) as fieldName
from foo f
inner join bar b
   on f.id = b.fId
where b.cId = @id;

Checking the execution plan (I'm no EP Hero) I noticed that there are three major CPU points. The distinct (as expected) and two seeks over the indeces. I would personally think that the indices seek should be extremely fast, but they take up 18% of the cost each. Is this normal? Is it due to the (quadruple) clustered indexes?

--UPDATE--
The query is used for creating a Lucene index. It's a one-time processing that happens about weekly (sounds contradictive, I know). I can't re-use any results here as far as I see.

+3  A: 

Could you please run the following queries and post their output:

SELECT  COUNT(*), COUNT(DISTINCT fieldname)
FROM    foo

SELECT  COUNT(*), COUNT(DISTINCT cId), COUNT(DISTINCT fId)
FROM    bar

This will help to estimate which indexes best suit your needs.

Meanwhile make sure you have the following indexes:

foo (FIELDNAME)
bar (cId, fId)

and rewrite your query:

SELECT  DISTINCT(fieldname)
FROM    foo f
WHERE   EXISTS (
        SELECT  1
        FROM    bar b
        WHERE   b.fId = f.id
                AND b.cId = @id
        )

This query should use an index on f.FIELDNAME to build the DISTINCT list and the index on bar to filter out the non-existent values.

Quassnoi
The big question is if the query needs optimising or the app needs to be a bit less aggressive about calling the query
Sam Saffron
It never hurts to make both :)
Quassnoi
How do you mean less agressive? What I'm doing is building a Lucene index for fast searching. This query needs to be repeated for every @id I have and there I don't see any way to re-use previous results.
borisCallens
@boris, bingo, get all the ids up-front insert them into a temp table or something, and select the whole kaboodle as a set. try to do things in sets not one id at a time
Sam Saffron
Thanks for the advice, I will run the queries as soon as the current processing is done. In the mean time, you don't think the clustered indexes are the issue here?
borisCallens
I can't because each id depends on the result of the previous query.
borisCallens
I find that when an INNER JOIN is re-written using EXISTS then it is often the case that the DISTINCT is no longer required.
onedaywhen
No, it's required for this certain query. Otherwise you'll get duplicate fieldnames if they exist in foo. The plan will be rewritten for sure if it's what you mean.
Quassnoi
I don't see details of the design posted here but if fieldname is unique in foo then the DISTINCT could be removed ;-)
onedaywhen
BTW is your re-written query missing some table correlation names e.g. foo AS f ... bar AS b... ?
onedaywhen
@onedaywhen: That's why I asked for COUNT()'s, to see if it's unique or not. As for correlations, you're right, I missed them.
Quassnoi
+1  A: 

This kind of query looks familiar. Im guessing here, but, it's probably populating a combo box on a web/winform ui that is being hit pretty hard.

Perhaps you should be caching the results on the application side so you don't end up executing it so often. Worse case scenario you could cache this on sql servers side, but its a massive kludge.

Sam Saffron
Yes, I can see what you mean, but I don't feel that is the case in my current situation. Please see the OP for an update.
borisCallens
A: 

In most databases, indexes aren't used if the first column in the index isn't listed. You say that the customerId is part of every primary key, but you don't use it for the join in your query. To properly answer your question, we really need to see the create table output for foo and bar, or at least show index from.

That said, your query may be faster if you change it like so:

select distinct(f.FIELDNAME) as fieldName
from foo f
inner join bar b
   on f.id = b.fId
   and f.cId = b.cId #Using this part of the key will speed it up
where b.cId = @id;

My comment assumes that your primary key is ordered as "cId, fId" Effectively, that will mean that your query doesn't have to check every cId, only the ones that are part of the index.

Autocracy
The CustomerId isn't used yet, but will be in the future. But I feel that it should have been part of the regular fields and an artificial key should have been used.
borisCallens