ansaurus

Question

Answer 1

+12 A:

You definitely want to have indexes on attributeID on both the attributes and expressions table. If you don't currently have those indexes in place, I think you'll see a big speedup.

JerSchneid 2009-05-27 17:08:26

Not to forget that both columns should be of the same data type, and, if they are character data, of the same collation.

Tomalak 2009-05-27 17:12:56

Knowing the primary key would help. A single column which is the primary key would already be indexed. It's possible that your expressions table has two fields which make up the primary key. This means that creating an index on E.attributeId would be the way to go. The primary key would create an index using both E.ID and E.attributeId. Adding a index for only E.attributeId would speed it up.

Kieveli 2009-05-27 17:13:06

Actually, the primary key isnt autoindexed on all platforms. MySql for instance does not create an index by default on the primary key.

Goblyn27 2009-05-27 17:20:26

I have an index on expressionId, attributeId (PK) on the expressions table and a clustered index on attributeId (PK) on the attributes table

JohnIdol 2009-05-27 17:23:03

Not necessarily do both tables need an index. It's actually bad form to just blindly add indexes in this manner. You need to make sure your DB stats are up-to-date and see how the table sizes stack up. More than likely, the optimizer is going to do a full table scan on the base table no matter what (since there's no WHERE clause) so the index on AttributeId on the base table is just wasted space.

Matt 2009-05-27 17:23:11

@Goblyn27: Can you cite a reference for this? I use MySQL quite a lot, and a PRIMARY KEY constraint does create an index implicitly.

Bill Karwin 2009-05-27 17:24:28

please note EDIT on the question

JohnIdol 2009-05-27 17:29:06

Answer 2

+4 A:

In fact, because there are so few columns being returned, I would consider a covered index for this query

i.e. an index that includes all the fields in the query.

Goblyn27 2009-05-27 17:09:27

how would I index a join? (never done)

JohnIdol 2009-05-27 17:19:35

I think that Goblyn is suggesting adding an index on A.attributeId, A.attributeName,A.attributeValue and another on E.attributeId and E.expressionID...but I'm not 100% sure. The theory of this being that all of the data for the query would come directly from the indices and never hit the table.

Greg 2009-05-27 17:26:54

Sorry, I wasnt clear on that. Greg is correct. In this instance there would be two covered indexes, one for each table and the join would take place between the two covered indexes without involving the actual table.

Goblyn27 2009-05-27 17:36:07

I'll give it a shot and report back

JohnIdol 2009-05-27 17:42:44

Answer 3

+3 A:

Some things you need to care about are indexes, the query plan and statistics.

Put indexes on attributeId. Or, make sure indexes exist where attributeId is the first column in the key (SQL Server can still use indexes if it's not the 1st column, but it's not as fast).

Highlight the query in Query Analyzer and hit ^L to see the plan. You can see how tables are joined together. Almost always, using indexes is better than not (there are fringe cases where if a table is small enough, indexes can slow you down -- but for now, just be aware that 99% of the time indexes are good).

Pay attention to the order in which tables are joined. SQL Server maintains statistics on table sizes and will determine which one is better to join first. Do some investigation on internal SQL Server procedures to update statistics -- it's been too long so I don't have that info handy.

That should get you started. Really, an entire chapter can be written on how a database can optimize even such a simple query.

Matt 2009-05-27 17:18:32

Answer 4

+1 A:

Another thing to do is add some indexes like this:

attributes.{attributeId, attributeName, attributeValue}
expressions.{attributeId, expressionID}

This is hacky! But useful if it's a last resort.

What this does is create a query plan that can be "entirely answered" by indexes. Usually, an index actually causes a double-I/O in your above query: one to hit the index (i.e. probe into the table), another to fetch the actual row referred to by the index (to pull attributeName, etc).

This is especially helpful if "attributes" or "expresssions" is a wide table. That is, a table that's expensive to fetch the rows from.

Finally, the best way to speed your query is to add a WHERE clause!

Matt 2009-05-27 17:44:02

would those indexes kill me on insertion? about WHERE - I am using this join to populate a temp table which I am gonna use to find the expressionID (if any) for a given set of name-value pairs (attributes). So I guess I could filter with OR disjuncts attributeNames+AttributeValues on this query to speed it up

JohnIdol 2009-05-27 18:48:55

I'd have to dynamically append the OR disjuncts though 'cause I need smt like WHERE (attributeName = 'X' AND attributeValue = 'Y') OR (attributeName = 'Z' AND attributeValue = 'W') ... and so forth! So I'd probably lose time looping through the table with the name value pairs and building these clauses

JohnIdol 2009-05-27 18:51:57

There's always a tradeoff of indexes for insertions. Again (and unfortunately), there's no one-size-fits-all answer. If you only have one or two indexes, and given this one isn't clustered, it's likely not going to kill you. That said, this IS an index that's heavily geared toward a specific query, so use at your discretion.

Matt 2009-05-27 19:58:31

Matt 2009-05-27 20:01:49

table gets about 10k rows and there's huge repetition of name-values. Anyway I'l lprobably ask another question for that specific problem - I meant this one just as performance suggestions for simple joins

JohnIdol 2009-05-27 20:35:19

Answer 5

+2 A:

I bet your problem is the huge number of rows that are being inserted into that temp table. Is there any way you can add a WHERE clause before you SELECT every row in the database?

JerSchneid 2009-05-27 19:01:21

I guess I could filter with OR disjuncts on attributeNames+AttributeValues on this query to speed it up but the problem is that I'd have to dynamically append the OR disjuncts 'cause I need smt like WHERE (attributeName = 'X' AND attributeValue = 'Y') OR (attributeName = 'Z' AND attributeValue = 'W') ... to get ultimately the ExpressionId of a given set of name-value pairs. So I'd probably lose time looping through the table with the name-value pairs and building these OR disjuncts for the WHERE clause.

JohnIdol 2009-05-27 20:30:33

That still may be better? Or you could look into caching that temp table. Either caching it in some middle-tier memory, or making that temp table a permanent table and updating it only when rows from the other tables change?

JerSchneid 2009-05-27 21:22:02

If I can't get significant improvements playing with indexes I'll go with the dynamic filtering of the join as describe din the previous comment - I'd like to avoid having pesistent caching tables!

JohnIdol 2009-05-28 12:15:08

I tried with dynamic filtering, I am moving with a fully populated db 9k instead of 70k - a bit better but not as much as I would've expected. I am rethinking the whole thing - maybe I can join the expression (e) table with the attributes (e) table on the e.articleId = a.articleId and then join on the name-value pairs table (t) on a.attributeName = t.name and a.attributeValue = t.value achieving the same kind of filtering with less computation!

JohnIdol 2009-05-28 20:40:05

I started another question --> http://stackoverflow.com/questions/923136/t-sql-filtering-on-dynamic-name-value-pairs

JohnIdol 2009-05-28 20:55:52

Answer 6

+1 A:

If I'm understanding your schema correctly, you're stating that your tables kinda look like this:

Expressions: PK - ExpressionID, AttributeID
Attributes:  PK - AttributeID

Assuming that each PK is a clustered index, that still means that an Index Scan is required on the Expressions table. You might want to consider creating an Index on the Expressions table such as: AttributeID, ExpressionID. This would help to stop the Index Scanning that currently occurs.

2009-05-27 21:01:30

your understanding is correct. You mean adding a nonclustered index on expressions for (ExpressionId, AttributeId) other than the clustered index that's already there?

JohnIdol 2009-05-27 21:58:01

ansaurus

tags:

views:

answers:

How to Speed Up Simple Join

related questions