views:

291

answers:

5

This is going to be one of those questions but I need to ask it.

I have a large table which may or may not have one unique row. I therefore need a MySQL query that will just tell me TRUE or FALSE.

With my current knowledge, I see two options (pseudo code):

[id = primary key]

OPTION 1:

SELECT id FROM table WHERE x=1 LIMIT 1
... and then determine in PHP whether a result was returned.

OPTION 2:

SELECT COUNT(id) FROM table WHERE x=1
... and then just use the count.

Is either of these preferable for any reason, or is there perhaps an even better solution?

Thanks.

+1  A: 

AFAIK, if you have an index on your ID column both queries will be more or less equal performance. The second query will need 1 less line of code in your program but that's not going to make any performance impact either.

e4c5
The index on the ID column does not matter much here, the OP wants an index on X.
Thilo
Oh yeah, sorry it's X
e4c5
+3  A: 

If the selection criterion is truly unique (i.e. yields at most one result), you are going to see massive performance improvement by having an index on the column (or columns) involved in that criterion.

create index my_unique_index on table(x)

If you want to enforce the uniqueness, that is not even an option, you must have

create unique index my_unique_index on table(x)

Having this index, querying on the unique criterion will perform very well, regardless of minor SQL tweaks like count(*), count(id), count(x), limit 1 and so on. For clarity, I would write

select count(*) from table where x = ?

I would avoid LIMIT 1 for two other reasons:

  • It is non-standard SQL. I am not religious about that, use the MySQL-specific stuff where necessary (i.e. for paging data), but it is not necessary here.
  • If for some reason, you have more than one row of data, that is probably a serious bug in your application. With LIMIT 1, you are never going to see the problem. This is like counting dinosaurs in Jurassic Park with the assumption that the number can only possibly go down.
Thilo
+1 for the Jurassic Park analogy. I'll probably steal that. Out of upvotes for the rest of the arguments - I fully agree.
Olaf
Thanks for the answer, although I do like using LIMIT 1. Non-standard SQL doesn't bother me (as long as it works) and as for the second argument, seeing such a bug in my application would be almost as miraculous as the dinosaurs coming back to life. :)
Tom
A: 

Typically, you use group by having clause do determine if there are duplicate rows in a table. If you have a table with id and a name. (Assuming id is the primary key, and you want to know if name is unique or repeated). You would use

select name, count(*) as total from mytable group by name having total > 1;

The above will return the number of names which are repeated and the number of times.

If you just want one query to get your answer as true or false, you can use a nested query, e.g.

select if(count(*) >= 1, True, False) from (select name, count(*) as total from mytable group by name having total > 1) a;

The above should return true, if your table has duplicate rows, otherwise false.

nanda
Since he is doing LIMIT 1, I doubt that the OP is looking for duplicates.
Thilo
+1  A: 

Personally I typically do the first one of selecting the id from the row and limiting to 1 row. I like this better from a coding perspective. Instead of having to actually retrieve the data, I just check the number of rows returned.

If I were to compare speeds, I would say not doing a count in MySQL would be faster. I don't have any proof, but my guess would be that MySQL has to get all of the rows and then count how many there are. Altough...on second thought, it would have to do that in the first option as well so the code will know how many rows there are as well. But since you have COUNT(id) vs COUNT(*), I would say it might be slightly slower.

Darryl Hein
+1  A: 

Intuitively, the first one could be faster since it can abort the table(or index) scan when finds the first value. But you should retrieve x not id, since if the engine it's using an index on x, it doesn't need to go to the block where the row actually is.

Another option could be:

select exists(select 1 from mytable where x = ?) from dual

Which already returns a boolean.

Samuel