tags:

views:

207

answers:

8

I have some servers. Some of them have ips assigned. I want to figure out how many do not. There are clearly more servers than have ips assigned, but my db tells me there are no servers that have no ips assigned...

I'm at my wit's end here. Is my DB corrupted in some strange way?

SELECT COUNT(*) 
  FROM server

...returns:

+----------+
| count(*) |
+----------+
|    23088 | 
+----------+
1 row in set (0.00 sec)

This:

SELECT COUNT(*) 
  FROM server 
 WHERE server_id IN (SELECT DISTINCT(server_id) 
                       FROM ips)

...returns:

+----------+
| count(*) |
+----------+
|    13811 | 
+----------+
1 row in set (0.01 sec)

This:

SELECT COUNT(*) 
  FROM server 
 WHERE server_id NOT IN (SELECT DISTINCT(server_id) 
                           FROM ips);

...returns:

+----------+
| count(*) |
+----------+
|        0 | 
+----------+
1 row in set (0.02 sec)

Results have been edited to protect the guilty, but you get the idea.

  • All tables are InnoDB.
  • Check table returns ok on both of these tables.

EDIT: thank you for the suggestion of using LEFT JOIN. This definitely confirms that the problem is the MySQL bug.

mysql> SELECT count(s.server_id) FROM server s LEFT JOIN ips i on s.server_id = i.server_id WHERE i.server_id IS NULL;
+--------------------+
| count(s.server_id) |
+--------------------+
|               9277 | 
+--------------------+
1 row in set (0.04 sec)

Since 9277 + 13811 = 23088, this means that all servers without ips + all servers with ips does indeed == all servers.

I've scheduled an upgrade of the mysql server for beginning of next week. Stay tuned.

+6  A: 

What version of MySQL? There seems to be a bug in < 5.0.25 / 5.1.12 that might be the culprit:

Bug #21282: NOT IN, more than 1000 returns incorrect results with INDEX:

Using a SELECT ... WHERE some_field NOT IN (...) and then 1000 or more values in the NOT IN part causes the server to return incorrect results if there is an INDEX/UNIQUE key on some_field. Less than 1000 criteria works correctly.

truppo
Your MySQL connection id is 17297414 to server version: 5.0.22-log. This does indeed look related. I probably shouldn't just go upgrading mysql without telling anyone, but stay tuned - I'll let you know how it turns out.
Igor
Wow, some bug! Good link to know about.
Arjan
I guess one can easily test using `select count(*) from server where server_id not in (select distinct(server_id) from ips limit 999);`
Arjan
+6  A: 

Do you have any Nulls in your columns?

Nescio
yes, ips that are not assigned have server_id IS NULL. that shouldn't affect my counts, but it does give me another possible estimate of the original error:| count(distinct(server_id)) |+----------------------------+| 13811 |this value matches the COUNT IN result i got above, so it's only the NOT IN result that's wrong
Igor
(Sorry, rolled back my edit just while @igor was typing a comment, as my SELECT was not complete at all...)
Arjan
Having nulls appear in the list of an IN clause invariably leads to confusion - both to the users of the DBMS and (often) to the implementors of the DBMS. Avoid them like the plague they are.
Jonathan Leffler
+2  A: 

Assuming the bug truppo found causes this, you could use this workaround:

select count(*)
from server s
left join ips i on i.server_id = s.server_id
where i.server_id is null

Above, i.server_id is null is true if the left join did not find a match (just like all columns from i would yield null for that situation).

Arjan
strangely enough, after an upgrade tonight to version 5.0.77 of MySQL, I am still getting 0 in the NOT IN query. Since nobody had a beef with my query itself, I'm going to assume the bug is not as fixed as the MySQL people think it is. in the meantime, this workaround returns the correct results so I am selecting this answer.
Igor
Hmmm, that might call for re-opening that bug then? (And why not 5.0.86 from that 5.0 branch -- though even for that version supports ends in 2 months?)
Arjan
+4  A: 

The server_id not in (ids) does not match NULL columns, so you only get the servers with a non-NULL server_id that isn't among those in ips. You'll want to use where server_id is null instead.

mmarx
+1  A: 

Do you have any record that has a NULL for server_id ? Because it would be excluded in both case.

Locksfree
+1  A: 

if you have NULL in your columns, they will evaluate to false in both cases. the result you are getting is in + not in - nulls

knittl
A: 
select count(*) 
from server 
where server_id not in (select distinct(server_id) from ips)
or server_id is NULL
Kirk Broadhurst
Somewhere hidden in the comments is *in table server, server_id is PRIMARY KEY AUTO_INCREMENT NOT NULL*. So, `server_id` won't be `null`.
Arjan
A: 

I would assume that there is something strange going on with the IN and NOT IN. Might be a bug or a "known limitation".

I'd suggest to first try to answer your initial question (servers without an ip) and then have a look at the data .. maybe that gives you an indication on what might be going on.

So here are some alternative ideas to give you what you are looking for:

SELECT server_id
FROM server
MINUS
SELECT server_id
FROM ips

Or

SELECT server_id
FROM server s LEFT JOIN ips i on s.server_id = i.server_id
WHERE i.server_id is null

As said above, this may give you an idea on why the data is not "caught" by your original statements.

IronGoofy