tags:

views:

61

answers:

2

This is not a problem that needs answering, more of a query on why this is happening.

I have a field in one table that is populated with 'Y' Or 'N', and i have a query that simply takes the value of that field and pops into another table

The table has approximately 25,000 records in it

The query below takes about 25 seconds to run

UPDATE ObjectivesApproved 
       INNER JOIN Approved 
          ON ObjectivesApproved.ID = Approved.ID 
   SET ObjectivesApproved.Football = [Approved].[Cri Football Related];

Removing the JOIN operation makes the query take even longer.

If however i do the below, the entire operation takes less than 5 seconds, even though it's executing 2 queries

UPDATE ObjectivesApproved 
       INNER JOIN Approved 
          ON ObjectivesApproved.ID = Approved.ID 
   SET ObjectivesApproved.Football = 'Y' 
 WHERE (([Approved].[Cri Football Related]='Y'));

UPDATE Approved 
       INNER JOIN ObjectivesApproved 
          ON Approved.ID = ObjectivesApproved.ID 
   SET ObjectivesApproved.Football = 'N' 
 WHERE (([ObjectivesApproved].[Football] Is Null));

I'm happy with my workaround, even if it is a little inelegant, but to further my understanding of SQL why might this be happening?

A: 

Your first version is updating 25K rows no matter what but it has to keep the tables in sync since it is using values from one into the other on a row by row basis. Every row that is updated has to read from a field - 25K times.

Your second version (both statements) filter data instead of row by row comparison. Internally a set of records is found and then updated in a batch instead of row by row computation. The value 'Y' doesn't have to be looked up each time - it is constant.

Imagine if I asked you to color 25K boxes black or white based on a list I gave you. Is it faster to pick up the first box, check the list, and color it, pick up the second box, check the list, color it, repeat. Or is it faster to pull out all ones supposed to be white and color them, then all the black ones and color them. Note you only have to "check the list" 2 times in the second case but 25K times the first one.

ktharsis
+1  A: 

I put these in comments, but realize they constitute something of an answer:

You say there are no indexes, but you say the ID fields are PKs. If so, there has to be a unique index on those fields. If there isn't, then they aren't really PKs, and that might explain why the versions with WHERE clauses are faster than the version just with a JOIN.

Also, Google "Jet SHOWPLAN" so you can see what the Jet query optimizer is doing and then you can really see what's going on.

With indexes, you'll get an index merge and that should be pretty fast. Without them, I'm not sure how Jet would do it. Also, it might make a difference if your Y/N field is indexed. The powers that be advise to never index sparsely populated fields (i.e., fields with low cardinality), but I've found that index Boolean fields in Jet/ACE actually can make a significant performance difference.

David-W-Fenton
"You say there are no indexes, but you say the ID fields are PKs. If so, there has to be a unique index on those fields. If there isn't, then they aren't really PKs" -- Note that if there isn't a PK on the table but there is a `NOT NULL UNIQUE` constraint then this will be used as the clustered index i.e. a PK in all but name. (If there is more than one `NOT NULL UNIQUE` constraint which one will ACE/Jet choose...?)
onedaywhen
Thanks for the tip about Jet SHOWPLAN, but i don't have access to my machines registry, and from past painful experience with the companies IT department they're unlikely to help. I'm upgrading the project to SQL server fairly soon at some point anyway.
Doozer1979
If you have indexing problems, you'll just be moving it to a different database and not actually addressing the cause of the problem.
David-W-Fenton
@onedaywhen: DDL aside, is it possible to assign a PK in Access/Jet/ACE without a unique index?
David-W-Fenton
@David-W-Fenton: It depends what you mean by PK. I can create an index-free tabke with a `CHECK` constraint that prevents duplicate rows. I would call that a true relational key. It would, however, make more practical sense to use a `NOT NULL` and `UNIQUE` in combination (or PK, even) and let the SQL product decide how best to enforce the key in the physcial layer. I avoid PK in SQL Server because it only offers me implicit behaviour that I'd rather be explicit about e.g. creating a a clustered index. I cannot say the same about ACE/Jet.
onedaywhen