A: 

Looking at the code above, why do you need a temp table?


DELETE FROM MatchEvent WHERE
MatchId IN (SELECT MatchId FROM MatchResult)


DELETE FROM MatchResult
-- OR Truncate can help here, if all the records are to be deleted anyways.

shahkalpesh
Sorry, I forgot to include that we are only deleting a subset of MatchIds, not all of them
Robin Weston
A: 

You probably want to process this piecewise in some way. (I assume queries are a lot more complicated that you showed?) In that case, you'd want try one of these:

  • Write your stored procedure to iterate over results. (Might still lock while processing.)
  • Repeatedly select the N first hits, eg LIMIT 100 and process those.
  • Divide work by scanning regions of the table separately, using something like WHERE M <= x AND x < N.
  • Run the "midnight job" more often. Seriously, running stuff like this every 5 mins instead can work wonders, especially if work increases non-linearly. (If not, you could still just get the work spread out over the hours of the day.)

In Postgres, I've had some success using conditional indices. They work magic by applying an index if certain conditions are met. This means that you can keep the many 'resolved' and the few unresolved rows in the same table, but still get that special index over just the unresolved ones. Ymmv.

Should be pointed out that this is where using databases gets interesting. You need to pay close attention to your indices and use EXPLAIN on your queries a lot.

(Oh, and remember, interesting is a good thing in your hobbies, but not at work.)

Anders Eurenius
This is for SQL Server, not MySQL...
gbn
+3  A: 

There's probably a lot of stuff going on here, and it's not all your query.

First, I agree with the other posters. Try to rewrite this without a temp table if at all possible.

But assuming that you need a temp table here, you have a BIG problem in that you have no PK defined on it. It's vastly going to expand the amount of time your queries will take to run. Create your table like so instead:

CREATE TABLE #tempMatchResult (
    matchId VARCHAR(50) NOT NULL PRIMARY KEY /* NOT NULL if at all possible */
);

INSERT INTO #tempMatchResult
SELECT DISTINCT MatchId FROM MatchResult;

Also, make sure that your TempDB is sized correctly. Your SQL server may very well be expanding the database file dynamically on you, causing your query to suck CPU and disk time. Also, make sure your transaction log is sized correctly, and that it is not auto-growing on you. Good luck.

Dave Markle
It would be marginally quicker to add an index to the temp table AFTER it has been populated.
ck
A: 

First, indexes are a MUST here see Dave M's answer.

Another approach that I will sometime use when deleting very large data sets, is creating a shadow table with all the data, recreating indexes and then using sp_rename to switch it in. You have to be careful with transactions here, but depending on the amount of data being deleted this can be faster.

Note If there is pressure on tempdb consider using joins and not copying all the data into the temp table.

So for example

CREATE TABLE #tempMatchResult (
    matchId VARCHAR(50) NOT NULL PRIMARY KEY /* NOT NULL if at all possible */
);

INSERT INTO #tempMatchResult
SELECT DISTINCT MatchId FROM MatchResult;

set transaction isolation level serializable
begin transaction 

create table MatchEventT(columns... here)

insert into MatchEventT
select * from MatchEvent m
left join #tempMatchResult t on t.MatchId  = m.MatchId 
where t.MatchId is null 

-- create all the indexes for MatchEvent

drop table MatchEvent
exec sp_rename 'MatchEventT', 'MatchEvent'

-- similar code for MatchResult

commit transaction 


DROP TABLE #tempMatchResult
Sam Saffron
A: 

Avoid the temp table if possible

It's only using up memory.
You could try this:

DELETE MatchEvent
FROM MatchEvent  e , 
     MatchResult r
WHERE e.MatchId = r.MatchId

If you can't avoid a temp table

I'm going to stick my neck out here and say: you don't need an index on your temporary table because you want the temp table to be the smallest table in the equation and you want to table scan it (because all the rows are relevant). An index won't help you here.

Do small bits of work

Work on a few rows at a time.
This will probably slow down the execution, but it should free up resources.

- One row at a time
SELECT @MatchId = min(MatchId) FROM MatchResult

WHILE @MatchId IS NOT NULL
BEGIN
    DELETE MatchEvent 
    WHERE  Match_Id = @MatchId 

    SELECT @MatchId = min(MatchId) FROM MatchResult WHERE MatchId > @MatchId 
END
- A few rows at a time
CREATE TABLE #tmp ( MatchId Varchar(50) ) 

/* get list of lowest 1000 MatchIds: */ 
INSERT #tmp 
SELECT TOP (1000) MatchId 
FROM MatchResult 
ORDER BY MatchId 

SELECT @MatchId = min(MatchId) FROM MatchResult

WHILE @MatchId IS NOT NULL
BEGIN
    DELETE MatchEvent
    FROM MatchEvent e , 
         #tmp       t
    WHERE e.MatchId = t.MatchId 

    /* get highest MatchId we've procesed: */  
    SELECT @MinMatchId = MAX( MatchId ) FROM #tmp  

    /* get next 1000 MatchIds: */  
    INSERT #tmp 
    SELECT TOP (1000) MatchId 
    FROM MatchResult 
    WHERE MatchId > @MinMatchId
    ORDER BY MatchId 

END

This one deletes up to 1000 rows at a time.
The more rows you delete at a time, the more resources you will use but the faster it will tend to run (until you run out of resources!). You can experiment to find a more optimal value than 1000.

AJ