tags:

views:

168

answers:

5

I have a table with this data:

Id     Qty  
--     ---  
A       1  
A       2  
A       3  
B       112  
B       125  
B       109

But I'm supposed to only have the max values for each id. Max value for A is 3 and for B is 125. How can I isolate (and delete) the other values?

The final table should look like this :

Id     Qty  
--     ---   
A       3  
B       125

Running MySQL 4.1

+1  A: 

Oh wait. Got a simpler solution : I'll select all the max values(group by id), export the data, flush the table, reimport only the max values.

CREATE TABLE tabletemp LIKE table;  
INSERT INTO tabletemp SELECT id,MAX(qty) FROM table GROUP BY id;  
DROP TABLE table;
RENAME TABLE tabletemp TO table;

Thanks to all !

Silence
OMG Ponies
Exactly what i thought... I'll edit my answer.
Silence
Will accept my own answer in 2 days unless someone come up with a real "non-cheating" way of doing it. With a join and stuff... just wondering which is faster at runtime.
Silence
In theory, you ought to be able to do something like `DELETE FROM tbl WHERE qty < (SELECT MAX(t.qty) FROM tbl t WHERE t.Id = tbl.Id)` but MySQL has a restriction where you can't have the table you're updating or deleting from in a subquery, you get error 1093. Your solution is probably the best.
Ken Keenan
For what it's worth, doing a `truncate table mytbl;` will be lots faster, since you're not wrapping that delete in a transaction (as `delete from` does). In other comments, you said you had a fairly small table (~400k rows), so a `delete` will happen pretty quickly, anyway.
Eric
You can speed it up by renaming the temporary table instead of copying everything a second time: DROP TABLE `table`; RENAME TABLE `tabletemp` TO `table`;
John Douthat
Edited to fit John's tip.
Silence
+1  A: 


Try this (On SQL Server)

delete from tbl o
left outer join 
(Select max(qty) anz , id
from tbl i
group by i.id) k on o.id = k.id and k.anz = o.qty
where k.id is null

Revision 2 for MySQL... Can anyone check this one?:

delete from tbl o
where concat(id,qty) not in 
    (select concat(id,anz) from (Select max(qty) anz , id
    from tbl i
    group by i.id))

Explanation:

Since I was supposed to not use joins (See comments about MySQL Support on joins and delete/update/insert), I moved the subquerry into a IN(a,b,c) clause.

Inside an In clause I can use a subquery, but that query is only allowed to return one field. So in order to filter all elements that are not the maximum, i need to concat both fields into a single one, so i can return it inside the in clause. So basically my query inside the IN returns the biggest ID+QTY only. To compare it with the main table i also need to make a concat on the outside, so the data for both fields match.

Basically the In clause contains: ("A3","B125")

Disclaimer: The above query is "evil!" since it uses a function (concat) on fields to compare against. This will cause any index on those fields to become almost useless. You should never formulate a query that way that is run on a regular basis. I only wanted to try to bend it so it works on mysql.

Example of this "bad construct": (Get all o from the last 2 weeks) select ... from orders where orderday + 14 > now()

You should allways do: select ... from orders where orderday > now() - 14

The difference is stuble: Version 2 only has to do the math once, and is able to use the index, and version 1 has to do the math for every single row in the orders table., and you can forget about the index usage...

Heiko Hatzfeld
Unknown column 'o.anz' in 'on clause'. You anz is scoped inside the ().
Silence
I also took out the outer join because its not recognised in mysql... not sure how to adapt it : right outer join or left outer join.
Silence
love how someone upvoted this and it doesn't even work (for now)
Silence
Sorry... only know T-SQL... Thats why I added the disclaimer there. But thats the basic construct i would use.
Heiko Hatzfeld
being able to use joins in INSERT, UPDATE and DELETE statements is SQL Server-specific
Ken Keenan
Don't be sorry. I'm really thankfull you tried. You obviously know your way around sql and after thinking about it I think it's great you've been rewarded for it.
Silence
Can you explain to me the goal of the Concat ?
Silence
See above - edited post to explain why/what i tried to do
Heiko Hatzfeld
A: 

MySQL 4.0 and later supports a simple multi-table syntax for DELETE:

DELETE t1 FROM MyTable t1 JOIN MyTable t2 ON t1.id = t2.id AND t1.qty < t2.qty;

This produces a join of each row with a given id to all other rows with the same id, and deletes only the row with the lesser qty in each pairing. After this is all done, the row with the greatest qty per group of id is left not deleted.

If you only have one row with a given id, it still works because a single row is naturally the one with the greatest value.


FWIW, I just tried my solution using MySQL 5.0.75 on a Macbook Pro 2.40GHz. I inserted 1 million rows of synthetic data, with different numbers of rows per "group":

  • 2 rows per id completes in 26.78 sec.
  • 5 rows per id completes in 43.18 sec.
  • 10 rows per id completes in 1 min 3.77 sec.
  • 100 rows per id completes in 6 min 46.60 sec.
  • 1000 rows per id didn't complete before I terminated it.
Bill Karwin
I will test this but a full join like this cost A LOT. Pretty sure it will be really really long.
Silence
If I'm not mistaken this solution is O(n). And I'm working with a very huge table.
Silence
Query stopped after an hour or so because it still wasn't done. Sorry but as I thought, a full join is impractical on a large table : the table has 376 000 entries. My solution only took a few seconds.
Silence
@Silence: 376,000 rows is *not* a large table. The issue is how many rows get returned back on this query. In this case, it's a direct correlation between number of rows per ID and those rows returned. You can get into the case of a near Cartesian join if you have very few IDs in those 376,000 rows. Moreover, please do not use big-O notation when discussing database operations. Most of the operations are done relationally and are not imperative, like big-O describes. The reason that this approach is slow is due to you having a low number of valid IDs, not the size of the table.
Eric
Sorry. Fairly new at SQL. Thanks for the precisions Eric.
Silence
A: 

I'd try this:

delete from T
where exists (
  select * from T as T2
  where T2.Id = T.Id
  and T2.Qty > T.Qty
);

For those who might have similar question in the future, this might be supported some day (it is now in SQL Server 2005 and later)

It won't require a join, and it has advantages over the use of a temporary table if the table has dependencies

with Tranked(Id,Qty,rk) as (
  select
    Id, Qty,
    rank() over (
      partition by Id
      order by Qty desc
    )
  from T
)
  delete from Tranked
  where rk > 1;
Steve Kass
A: 

You'll have to go via another table (among other things that makes a single delete statement here quite impossible in mysql is you can't delete from a table and use the same table in a subquery).

BEGIN;
create temporary table tmp_del select id,max(qty) as qty from the_tbl;
delete the_tbl from the_tbl,tmp_del where 
  the_tbl.id=tmp_del.id and the_tbl.qty=tmp_del.qty;
drop table tmp_del;
END;
nos