views:

800

answers:

5

Tonight may be the night I crack...

Okay, so if you would like a back story, look at my previous question

I have finally decided that if this is possible, It's not worth the time it will take to figure out, since I really should have had this done before tonight anyway. So I decided to go another route:

Figuring out which of my records is not a duplicate is quite easy:

SELECT *
FROM eventlog
GROUP BY event_date, user
HAVING COUNT(*) = 1
ORDER BY event_date, user

This returns all of my non-duplicates. So I thought I'd move them over to another table called "no_duplicates" and then delete them from the original table. Then I could see the duplicates all alone in the original table, fix them up, and add the no_dupes back. But while:

INSERT INTO no_duplicates
SELECT *
FROM eventlog
GROUP BY event_date, user
HAVING COUNT(*) = 1
ORDER BY event_date, user

works like a charm, the following throws an error:

DELETE
FROM eventlog
GROUP BY event_date, user
HAVING COUNT(*) = 1
ORDER BY event_date, user

My guess is that while the query returns unique, already-in-the-table records, deleting by a aggregate function isn't kosher. Which is understandable except I don't know what else I can do to secure that only the records I moved are deleted. I searched and found no "After INSERT kill the records in the original table" syntax, and my guess is that it would fail anyway, for the same reason the delete failed.

So if anyone out there can please help me find the missing piece, I'll be really, really grateful.


Real quick: Thank your for everyone so far. You've all been really insightful and educational. I just want to explain why I'm a bit frantic in case I lack my usual cool and curious attitude.

Essentially this is a project I was working on more for fun and to show off around the office. I was asked a few days ago if it was ready to go live when it hadn't really been tested outside of small group. Next thing I know, it's live and held together with twine and bubble gum. Now I'm trying to make it actually ready for the next go around (bi-monthly reports). Meanwhile, it was already used in the real-world, so there is some real-world data that if I screw this up just a tiny bit more, will be gone and can't be retrieved without doing it all from hand.

So! There are no primary keys, I'm embarrassed to say, but there are about 3 copies of everything BUT the real-world data.

I guess you're all caught up. AGain, thank you for any help you can give.

A: 

You may try adding a primary key auto increment column, move your uniques into a new table and do a delete eventlog.* from eventlog e inner join no_dupes nd on e.id=nd.id

SiViG
+1  A: 

Firstly, you do not need the ORDER BY on the INSERT and DELETE. It's just useful if you need them for presentation which is not the case with those operations.

If your EVENTLOG table has a primary key (like ID) you could form your DELETE-statement like this:

DELETE 
FROM eventlog
WHERE id IN (
  SELECT *
  FROM eventlog
  GROUP BY event_date, user
  HAVING COUNT(*) = 1
)
Kosi2801
No key, this is a major issue, I know.
Anthony
Okay, obviously I'm spazzing out. The idea of adding a primary key seemed out of the question at this stage, since there are other tables with identical structure. But after a quick breath, I added a key to the problem table, and now I'm looking at just my problem records. All of you were seriously a big help.
Anthony
A: 

You should have a look at the DELETE syntax, there are neither GROUP BYs, nor HAVINGs, nor ORDERs.

soulmerge
Sorry, I was grabbing everything as efficiently as I could. I know ORDERS are not necessary for deleting things. And I quickly learned that GROUP BY's and HAVINGs are out, hence the question.
Anthony
ORDERs are not **allowed** in DELETEs (apart from being pointless)
soulmerge
+2  A: 

I would not delete with the aggregate function. I think you would like to delete all the inserted data ?

Why not try something like this:

DELETE 
FROM eventlog
WHERE (user, event_date) IN (SELECT user, event_data FROM no_duplicates)
bernhardrusch
So, to be clear. putting the `WHERE` clause in parenthesis means that both the user and the event date need to be in the specific row of the no_duplicates table? Because obviously the users and event_dates repeat independently throughout both.Okay, I hate to sound rushed or incompetent, so please forgive me. I'm really anxious because rebuilding this will take even more time and make people even more annoyed with me. I hate deadlines.
Anthony
Yes - but to be 100% sure - always backup before trying statements
bernhardrusch
So this means the statement deletes all the rows where the user AND event_date is the same than in the row from the no_duplicates.I think this is what you want ?Otherwise you would have to specifiy a better key (id ?)
bernhardrusch
another tip - if you want to test out if a delete statement is working - use the where clause with a "select *". Then you see all the rows which are going to be deleted. [Maybe there are some edge cases where this statement wouldn't work - but I can't think of any)
bernhardrusch
Actually, instead of deleting them, I simply did a `SELECT` so that only the duplicates showed up (basically, `SELECT * FROM eventlog WHERE id NOT IN (SELECT id FROM duplicates)`. Turns out having keys was handy in so many obvious ways. Knock on head.
Anthony
That multiple field syntax was a real life saver for me. thx.
Noah Goodrich
A: 

What's the database? If MySQL then

Currently, you cannot delete from a table and select from the same table in a subquery.

At least for 5.0 version. ( http://dev.mysql.com/doc/refman/5.0/en/delete.html )

StarWind Software
Please provide answers, not ads.
soulmerge