ansaurus

Question

SQL Delete low counts

Answer 1

+1 A:

Oh wait. Got a simpler solution : I'll select all the max values(group by id), export the data, flush the table, reimport only the max values.

CREATE TABLE tabletemp LIKE table;  
INSERT INTO tabletemp SELECT id,MAX(qty) FROM table GROUP BY id;  
DROP TABLE table;
RENAME TABLE tabletemp TO table;

Thanks to all !

Silence 2009-09-19 21:34:26

OMG Ponies 2009-09-19 21:59:41

Exactly what i thought... I'll edit my answer.

Silence 2009-09-19 22:01:52

Will accept my own answer in 2 days unless someone come up with a real "non-cheating" way of doing it. With a join and stuff... just wondering which is faster at runtime.

Silence 2009-09-19 22:06:33

In theory, you ought to be able to do something like `DELETE FROM tbl WHERE qty < (SELECT MAX(t.qty) FROM tbl t WHERE t.Id = tbl.Id)` but MySQL has a restriction where you can't have the table you're updating or deleting from in a subquery, you get error 1093. Your solution is probably the best.

Ken Keenan 2009-09-19 22:14:36

For what it's worth, doing a `truncate table mytbl;` will be lots faster, since you're not wrapping that delete in a transaction (as `delete from` does). In other comments, you said you had a fairly small table (~400k rows), so a `delete` will happen pretty quickly, anyway.

Eric 2009-09-20 05:13:00

You can speed it up by renaming the temporary table instead of copying everything a second time: DROP TABLE `table`; RENAME TABLE `tabletemp` TO `table`;

John Douthat 2009-09-20 18:56:34

Edited to fit John's tip.

Silence 2009-09-21 13:11:24

Answer 2

+1 A:

Try this (On SQL Server)

delete from tbl o
left outer join 
(Select max(qty) anz , id
from tbl i
group by i.id) k on o.id = k.id and k.anz = o.qty
where k.id is null

Revision 2 for MySQL... Can anyone check this one?:

delete from tbl o
where concat(id,qty) not in 
    (select concat(id,anz) from (Select max(qty) anz , id
    from tbl i
    group by i.id))

Explanation:

Since I was supposed to not use joins (See comments about MySQL Support on joins and delete/update/insert), I moved the subquerry into a IN(a,b,c) clause.

Inside an In clause I can use a subquery, but that query is only allowed to return one field. So in order to filter all elements that are not the maximum, i need to concat both fields into a single one, so i can return it inside the in clause. So basically my query inside the IN returns the biggest ID+QTY only. To compare it with the main table i also need to make a concat on the outside, so the data for both fields match.

Basically the In clause contains: ("A3","B125")

Disclaimer: The above query is "evil!" since it uses a function (concat) on fields to compare against. This will cause any index on those fields to become almost useless. You should never formulate a query that way that is run on a regular basis. I only wanted to try to bend it so it works on mysql.

Example of this "bad construct": (Get all o from the last 2 weeks) select ... from orders where orderday + 14 > now()

You should allways do: select ... from orders where orderday > now() - 14

The difference is stuble: Version 2 only has to do the math once, and is able to use the index, and version 1 has to do the math for every single row in the orders table., and you can forget about the index usage...

Heiko Hatzfeld 2009-09-19 21:36:12

Unknown column 'o.anz' in 'on clause'. You anz is scoped inside the ().

Silence 2009-09-19 21:43:17

I also took out the outer join because its not recognised in mysql... not sure how to adapt it : right outer join or left outer join.

Silence 2009-09-19 21:48:11

love how someone upvoted this and it doesn't even work (for now)

Silence 2009-09-19 21:51:06

Sorry... only know T-SQL... Thats why I added the disclaimer there. But thats the basic construct i would use.

Heiko Hatzfeld 2009-09-19 21:55:28

being able to use joins in INSERT, UPDATE and DELETE statements is SQL Server-specific

Ken Keenan 2009-09-19 21:59:27

Don't be sorry. I'm really thankfull you tried. You obviously know your way around sql and after thinking about it I think it's great you've been rewarded for it.

Silence 2009-09-19 22:01:10

Can you explain to me the goal of the Concat ?

Silence 2009-09-21 17:35:23

See above - edited post to explain why/what i tried to do

Heiko Hatzfeld 2009-09-21 18:38:37

Answer 3

A:

MySQL 4.0 and later supports a simple multi-table syntax for DELETE:

DELETE t1 FROM MyTable t1 JOIN MyTable t2 ON t1.id = t2.id AND t1.qty < t2.qty;

This produces a join of each row with a given id to all other rows with the same id, and deletes only the row with the lesser qty in each pairing. After this is all done, the row with the greatest qty per group of id is left not deleted.

If you only have one row with a given id, it still works because a single row is naturally the one with the greatest value.

FWIW, I just tried my solution using MySQL 5.0.75 on a Macbook Pro 2.40GHz. I inserted 1 million rows of synthetic data, with different numbers of rows per "group":

2 rows per id completes in 26.78 sec.
5 rows per id completes in 43.18 sec.
10 rows per id completes in 1 min 3.77 sec.
100 rows per id completes in 6 min 46.60 sec.
1000 rows per id didn't complete before I terminated it.

Bill Karwin 2009-09-19 23:53:07

I will test this but a full join like this cost A LOT. Pretty sure it will be really really long.

Silence 2009-09-20 00:15:49

If I'm not mistaken this solution is O(n). And I'm working with a very huge table.

Silence 2009-09-20 00:19:40

Query stopped after an hour or so because it still wasn't done. Sorry but as I thought, a full join is impractical on a large table : the table has 376 000 entries. My solution only took a few seconds.

Silence 2009-09-20 01:26:52

@Silence: 376,000 rows is *not* a large table. The issue is how many rows get returned back on this query. In this case, it's a direct correlation between number of rows per ID and those rows returned. You can get into the case of a near Cartesian join if you have very few IDs in those 376,000 rows. Moreover, please do not use big-O notation when discussing database operations. Most of the operations are done relationally and are not imperative, like big-O describes. The reason that this approach is slow is due to you having a low number of valid IDs, not the size of the table.

Eric 2009-09-20 05:10:09

Sorry. Fairly new at SQL. Thanks for the precisions Eric.

Silence 2009-09-21 13:09:56

Answer 4

A:

I'd try this:

delete from T
where exists (
  select * from T as T2
  where T2.Id = T.Id
  and T2.Qty > T.Qty
);

For those who might have similar question in the future, this might be supported some day (it is now in SQL Server 2005 and later)

It won't require a join, and it has advantages over the use of a temporary table if the table has dependencies

with Tranked(Id,Qty,rk) as (
  select
    Id, Qty,
    rank() over (
      partition by Id
      order by Qty desc
    )
  from T
)
  delete from Tranked
  where rk > 1;

Steve Kass 2009-09-20 05:02:45

Answer 5

A:

You'll have to go via another table (among other things that makes a single delete statement here quite impossible in mysql is you can't delete from a table and use the same table in a subquery).

BEGIN;
create temporary table tmp_del select id,max(qty) as qty from the_tbl;
delete the_tbl from the_tbl,tmp_del where 
  the_tbl.id=tmp_del.id and the_tbl.qty=tmp_del.qty;
drop table tmp_del;
END;

nos 2009-09-21 18:59:12

ansaurus

tags:

views:

answers:

SQL Delete low counts

related questions