views:

57

answers:

2

I have a lookup table (##lookup). I know it's bad design because I'm duplicating data, but it speeds up my queries tremendously. I have a query that populates this table

insert into ##lookup select distinct col1,col2,... from table1...join...etc...

I would like to simulate this behavior:

delete from ##lookup
insert into ##lookup select distinct col1,col2,... from table1...join...etc...

This would clearly update the table correctly. But this is a lot of inserting and deleting. It messes with my indexes and locks up the table for selecting from.

This table could also be updated by something like:

delete from ##lookup where not in (select distinct col1,col2,... from table1...join...etc...)
insert into ##lookup (select distinct col1,col2,... from table1...join...etc...) except if it is already in the table

The second way may take longer, but I can say "with no lock" and I will be able to select from the table.

Any ideas on how to write the query the second way?

+2  A: 
DELETE LU
FROM ##lookup LU
LEFT OUTER JOIN Table1 T1 ON T1.my_pk = LU.my_pk
WHERE T1.my_pk IS NULL

INSERT INTO ##lookup (my_pk, col1, col2...)
SELECT T1.my_pk, T1.col1, T1.col2...
FROM Table1 T1
LEFT OUTER JOIN ##lookup LU ON LU.my_pk = T1.my_pk
WHERE LU.my_pk IS NULL

You could also use WHERE NOT EXISTS instead of the LEFT JOINs above to look for non-existence of rows.

You might also want to look into the MERGE statement if you're on SQL 2008. Otherwise, you aren't keeping the tables in sync - you're only keeping the PKs in sync. If one of the column changes in one table but not the other that won't be reflected above.

Either way, it sounds like you might want to consider optimizing queries. While duplicating the data may seem like a nice fix for your performance issues, as you can see it can carry a lot of headaches with it (and this is just one). You're better off finding the underlying cause of the poor performance and fixing that rather than putting on this ugly bandaid.

Tom H.
ya well the ideal solution would be to use a "meterialized veiw" or an indexed view as it's call in SQL server. However to index a view you have to not use an "self joins" in this case self joins even include using the same tables twice in a join even if it's not technically a self join. So can't use that option which would be optimal. Trust me, myself and several people much smarter than me spent many hours tryign to come up with a better soluton. Indexing the source tables to death only got us so far, and the querys cannot be optimized furthur...
kralco626
but, I totally agree with you. THis is a shitty solution... if only microsoft let me use the view... There is a work arround for my issue, but it's too complicated for me to figure out. I asked a questions about it here: http://stackoverflow.com/questions/3046058/need-some-serious-help-with-self-join-issueIf that issue gets solved i wont have to do this...
kralco626
A: 

All DELETEs are logged which kills performance if you're plan is to nuke the whole table. Depending on how many rows you're dealing with, you might be okay to just use the non-logged TRUNCATE.

How long does your SELECT statement take? You could try something like this if the select takes a small amount of time and you aren't running it frequently.

select distinct ... INTO #tempTable1 from table1...join...etc...

begin transaction drop table ##lookup select * into ##lookup from #tempTable1 commit transaction

Tom's answer is probably the most robust, but I just thought I'd chime in with some alternatives. Not sure why a global temporary table is necessary compared to a real table though???

mattmc3
ya well i don't have rights to create a real table on the database. So i just play around with temps untill I figure out what i want, and then go about getting the real table created. So in practice when this is actually used, I would be using a real table, and I would be unable to drop the table...
kralco626