tags:

views:

346

answers:

8

In the case when I want to check, if a certain entry in the database exists I have two options.

I can create an sql query using COUNT() and then check, if the result is >0...

...or I can just retrieve the record(s) and then count the number of rows in the returned rowset. For example with $result->num_rows;

What's better/faster? in mysql? in general?

+2  A: 

YMMV, but I suspect that if you are only checking for existence, and don't need to use the retrieved data in any way, the COUNT() query will be faster. How much faster will depend on how much data.

Matt McClellan
+2  A: 

The fastest is probably asking the database if something exists:

SELECT EXISTS ([your query here])
rebra
This can be MUCH faster becuase you don't care HOW MANY which saves SQL time.
n8wrl
And it gives the (database server) optimizer a chance to do its thing: No need to count stuff if you just need to know it's there.
rebra
You are both wrong, this method may be faster for small tables, but once you get past a certain amount of data SELECT COUNT(*) is faster. If you have to blindly choose, pick COUNT(*) because it's fast even on large tables and equally fast on small tables.
TravisO
and in my case (mysql) I have no choice since EXISTS can only be used in subqueries it seems
tharkun
rebra
ok, I'm getting there! thank you all!
tharkun
@TravisO: I'm not sure why you think COUNT would ever be faster than EXISTS *especially* on large tables. EXISTS can stop as soon as it finds one row that matches the given criteria. COUNT has to keep going.
Tom H.
A: 

I would definitely do it in the PHP to decrease load on the database.

In order to get a count and get the returned rows in SQL you would have to do two queries.. a COUNT and then a SELECT

The PHP way gives you everything you need in one result object.

adam
I do not think this will reduce the load on the database. The database has to actually get the data, and it must be transferred and transformed to PHP. In contrast, when doing a SELECT EXISTS the database can just consult its index.
rebra
A: 

SELECT COUNT(*) FROM table

is the best choice, this operation is extremely fast both on small tables and large tables. While it's possible that

SELECT id FROM table

is faster on small tables, the difference in speed will be microscopic. But if you have a large table, this operation can be very slow.

Therefore, your best bet is to always choose to COUNT(*) the table (and it's faster to do * than it is to pick a specific column) as overall, it will be the fastest operation.

TravisO
Now we're talking database implementations. COUNT(*) will not always be fast. In Postgresql for example, a COUNT(*) on a very large table could be quite slow. EXISTS on the other hand was made for this purpose.
rebra
it's just that as far as I can see in mysql there is no EXISTS
tharkun
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
rebra
yes, for subqueries but not for the main query, right?
tharkun
I would try. (I cannot myself)
rebra
+1  A: 

If all you are doing is checking for the existance, then

Select count(*) ...

But if you will retrieve the data if it exists, then just get the data and check it in PHP, otherwise you'll have two calls.

John MacIntyre
+1  A: 

For me is in the database.

Making a count(1) is faster than $result->num_rows because in the $result->num_rows you make 2 operations 1 select and a count if the select has a count is faster to get the result. Except if you also want the information from the db.

Luís Tedim
+1  A: 
SELECT 1 
  FROM (SELECT 1) t 
 WHERE EXISTS( SELECT * FROM foo WHERE id = 42 )

Just tested, works fine on MySQL v5

COUNT(*) is generally less efficient if:

  1. you can have duplicates (because the DBMS will have to exhaustively search all of the records/indexes to give you the exact answer) or
  2. have NULL entries (for the same reason)

If you are COUNT'ing based on a WHERE clause that is guaranteed to produce a single record (or 0) and the DBMS knows this (based upon UNIQUE indexes), then it ought to be just as efficient. But, it is unlikely that you will always have this condition. Also, the DBMS may not always pick up on this depending on the version and DBMS.

Counting in the application (when you don't need the row) is almost always guaranteed to be slower/worse because:

  1. You have to send data to the client, the client has to buffer it and do some work
  2. You may bump out things in the DBMS MRU/LRU data cache that are more important
  3. Your DBMS will (generally) have to do more disk I/O to fetch record data that you will never use
  4. You have more network activity

Of course, if you want to DO something with the row if it exists, then it is definitely faster/best to simply try and fetch the row to begin with!

Matt Rogish
that sounds like the best option if I wante to retrieve the data in case of existence! thank you!
tharkun
would you mind adding a small paragraph about the difference of just checking on one hand and checking plus if exists retrieving on the other hand. then I would mark your answer as accepted.
tharkun
A: 

If you want raw speed, benchmark! In addition to the methods others have suggested:

SELECT 1 FROM table_name WHERE ... LIMIT 1

may be faster due to avoiding the subselect. Benchmark it.

derobert