views:

65

answers:

5

I am working with an old MySQL table, which serves as a log of sorts. It looks like

CREATE TABLE `queries` (
  `Email` char(32) NOT NULL DEFAULT '',
  `Query` blob,
  `NumRecords` int(5) unsigned DEFAULT NULL,
  `Date` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Now, I need to be able to UPDATE the records in this table (don't ask why, I don't know). Normally, I would just do

UPDATE table SET ... WHERE unique_column = value

But in this case, I don't have a unique column to work from.

Is there a workaround for this, or am I just going to have to push to put in a nice, standard INT NOT NULL AUTO_INCREMENT?

+2  A: 

A unique identifier is the only reliable way of doing this. Just add an auto_increment column and be done with it.

For exhaustive info including some workaround approaches (none of them perfect though!) check this question, where the OP had a table without a unique identifier and no way to change it.

Update: As Doug Currie points out, this is not entirely true: A unique ID is not necessary as such here. I still strongly recommend the practice of always using one. If two users decide to update two different rows that are exact duplicates of each other at the exact same time (e.g. by selecting a row in a GUI), there could be collisions because it's not possible to define which row is targeted by which operation. It's a microscopic possibility and in the case at hand probably totally negligeable, but it's not good design.

Pekka
+1: If the `Date` column were a DATETIME data type, it would be a candidate because DATETIME is down to the microsecond. But without a unique key on the column, there's nothing to ensure such is the case...
OMG Ponies
Why do you say there there is no reliable way to do it? Just update the row where all columns match. There should be only one since SQL is set oriented.
Doug Currie
@Doug what if he wants to target only one of two rows with completely identical contents? Why should there be only one?
Pekka
@Pekka, SQL is set oriented; there can't be two rows with identical contents. If this is an extension of SQL then the dialect should support UPDATE ... WHERE ... LIMIT 1; and it won't matter which matching row is updated since they are the same.
Doug Currie
Just what I was going to say, Pekka. It is entirely possible that someone using a single email address (`Email`) executed the exact same query (`Query`), returning the same number of results (`NumRecords`), on the same day (`Date`). If that happens, I want to be able to distinguish between the two.
Austin Hyde
@Doug, MySQL does indeed allow duplicate rows. A quick query just showed I have 16 duplicate rows in the table in question. However, you make a good point that, since they are identical, changing one is the same as changing any other one, and thus shouldn't matter.
Austin Hyde
@Austin I agree with you, but after some thinking, I agree that @Doug is right. If you `LIMIT 1` your query, it will always be fine. This is not a perfect solution: Problems might come up in multi-user environments when you need to lock a certain record. I'd say a unique ID is the right thing to do but @Doug I see what you're talking about, you're right.
Pekka
@Austin, OK see my answer then.
Doug Currie
@Austin Hyde - Then given your design, those two rows represent the exact same event. Once you answer the question of how a human would differentiate those two sets of information, you will know how the database needs to be altered to differentiate them.
Thomas
@Austin Hyde - If arbitrarily changing one of them is an option, why isn't arbitrarily changing both an option?
Thomas
@Thomas, what I mean is that just because the data doesn't differentiate any smaller than a day (because it is a `date`, instead of a `datetime`) doesn't mean two rows represent the same event. Just that the same email executed the same query, getting the same number of results, on the same day, x number of times, where x is the number of duplicate records.
Austin Hyde
@Austin Hyde - Again, if you told a human to differentiate between the two pieces of data how would they do it? How do you know they aren't the same event that fired twice? If a human can differentiate them based on some additional piece of data, then that data needs to be added to the table. I.e., if they are distinguishable events, additional attributes are needed to distinguish them even if it is the sequence in which they fired.
Thomas
@Austin that's true, but it doesn't matter. The `LIMIT 1` will take care that any operation you run, it will affect only one record. Which one of the duplicates it is, doesn't matter - as long as there are no concurrent operations.
Pekka
@Pekka - Unless you happen to execute that LIMIT 1 query multiple times against the same criteria in which case it is not guaranteed you'll get the same row. I agree on the unique id. Two rows with completely identical information are considered the same exact thing unless something is provided that distinguishes them.
Thomas
@Thomas yeah. If two users decide to update two different rows that are exact duplicates of each other at the exact same time (e.g. by selecting a row in a GUI), there could be collisions because it's not possible to define which row is targeted by which operation. It's a microscopic possibility and in the case at hand probably totally negligeable, but it's not good design. (Hey, this puts it well! Adding this to my answer. :)
Pekka
@Thomas and Pekka: Yes, I would very much like unique IDs, but I meant this question to be more generalized to "What to do with no unique columns". @Thomas, my point is that individually, they are indistinguishable (and thus difficult to work with), however, the fact that there are N number of duplicates of a row means that (assuming that 1 row corresponds to 1 event, and nothing went wrong) N identical queries were made by the same email address on the same day, with the same results.
Austin Hyde
@Pekka, I ended up accepting @Doug, because his solution solves my immediate problem. However, that being said, a unique ID, like you suggest, is ultimately the best solution. Thanks for the enlightening conversation :D
Austin Hyde
A: 

In case you didn't know this, it will affect performance, but you don't need to use a primary key in your WHERE clause when updating a record. You can single out a row by specifying the existing values:

UPDATE queries
SET Query = 'whatever'
WHERE Email = '[email protected]' AND
  Query = 'whatever' AND
  NumRecords = 42 AND
  Date = '1969-01-01'

If there are duplicate rows, why not update them all, since you can't differentiate anyway?

You just can't do it with a GUI interface in MySQL Query Browser.

If you need to start differentiating the rows, then add an autoincrement integer field, and you'll be able to edit them in MySQL Query Browser too.

Marcus Adams
A: 

There are two different issues here. First, is de-duping the table. That is an entirely different question and solution which might involve adding a auto_increment column. However, if you are not going to de-dup the table, then by definition, two rows with the same data represent the same instance of information and both ought to be updated if they match the filtering criteria. So, either add a unique key, de-dup the table (in which case uniqueness is based on the combination of all columns) or update all matching rows.

Thomas
+3  A: 
UPDATE queries 
SET ... 
WHERE Email = value1 
  AND Query = value2 
  AND NumRecords = value3 
  AND Date = value4 
LIMIT 1;
Doug Currie
That will arbitrarily use the first row returned, that is to say - you can not guarantee that it will always update the same row consistently. And adding an ORDER BY won't help either.
OMG Ponies
@OMG Ponies, So? The rows are indistinguishable if all the fields match.
Doug Currie
@Doug Currie - Then why not arbitrarily change all matching records instead of arbitrarily changing one? If there are two rows that are identical on all columns, they are by definition the same instance.
Thomas
@Doug As I said in the comments to my answer, you are right and there is no immediate need for a unique ID here, but I still think not using one is bad practice because of potential concurrency issues. It would be impossible to target a specific record for locking in a situation with two concurrent users/connections for example; also, concurrent updates could go wrong. Still, for the OP's situation, this may be fine. +1
Pekka
@Doug Currie: What do you not comprehend about the OP asking so they can reference a specific **row**, in light of the fact of identical data?
OMG Ponies
@Ponies: what I think he is trying to say, is that once I change one of the duplicate rows, it will no longer be indistinguishable.
Austin Hyde
I accepted you because this solves my immediate problem. A unique ID is ultimately the best solution, as per @Pekka.
Austin Hyde
A: 

Delete the duplicates first. What's the point of having duplicate rows in the table (or any table for that matter)?

Once you've deleted the duplicates you can implement the key and they your problem is solved.

dportas
Because, in this situation, duplicate rows may not be truly duplicates, because N duplicate rows means that the same query executed by someone with the same email returned the same results on the same day, was executed N times.
Austin Hyde