tags:

views:

49

answers:

5

Hello,

given the following tables, how would I go about finding the most common ip address across all tables, and ideally, the number of times that ip occurs across all tables.

bad_guys_1         bad_guys_2
| id | ip      |   | id | ip      |
+----+---------+   +----+---------+
| 1  | 1.2.3.4 |   | 1  | 1.2.3.4 |
| 2  | 2.3.4.5 |   | 2  | 4.5.6.7 |
| 3  | 3.4.5.6 |   | 3  | 1.2.3.4 |

bad_guys_3         bad_guys_4
| id | ip      |   | id | ip      |
+----+---------+   +----+---------+
| 1  | 9.8.7.6 |   | 1  | 1.2.3.4 |
| 2  | 8.7.6.5 |   | 2  | 2.3.4.5 |
| 3  | 2.3.4.5 |   | 3  | 3.4.5.6 |

For example, querying the above tables should result in something like:

| ip      | count |
+---------+-------+
| 1.2.3.4 | 4     |
| 2.3.4.5 | 3     |
| 3.4.5.6 | 2     |
| 4.5.6.7 | 1     |
| 9.8.7.6 | 1     |
| 8.7.6.5 | 1     |

The real tables actually contain many additional fields which don't line up with each other, thus separate tables. I don't really care about breaking ties between matches, just listing them in descending order by count would be great. My database is PostGreSQL if using any non-standard functions will assist, but for portability would prefer to use standard sql if possible. Thanks and let me know if you need any more detail.

+1  A: 

Try this...

select ip, count(*) 
from
(
select id, ip from bad_guys_1
union all
select id, ip from bad_guys_2
union all
select id, ip from bad_guys_3
union all
select id, ip from bad_guys_4
) a
group by ip
order by count(*) desc
Fosco
you need UNION ALL not UNION, or repeated values in 2,3,and 4 tables will not be counted.
mdma
updated.. though the likelihood of the ID and IP matching is small, you are correct.
Fosco
+1  A: 

Andy, You can use a "union" to create one big logical table (in memory) with just the IPs. Then you can do the normal

select count(ip), ip from 
(select ip from table1 union all select ip from table2 etc) unionedTable 
group by ip

[edited to add union all - thanks!]

Jeanne Boyarsky
you need UNION ALL not UNION, or repeated values in different tables will not be counted.
mdma
Fixed. Thanks mdma.
Jeanne Boyarsky
+1  A: 
       select ip, count(*) from
        (
        select id, ip from bad_guys_1
        union all
        select id, ip from bad_guys_2
        union all
        select id, ip from bad_guys_3
        union all
        select id, ip from bad_guys_4
        ) as ranking
        group by ip

order by count(*) desc 
Yves M.
you need UNION ALL not UNION, or repeated values in 2,3,and 4 tables will not be counted. (Assuming they also had the same id, which is possible.)
mdma
+2  A: 
 SELECT ip, count(*) c
 FROM 
 (
   SELECT ip
   from bad_guys_1 
   UNION ALL
   SELECT ip
   from bad_guys_2
   UNION ALL
   SELECT ip
   from bad_guys_3
   UNION ALL
   SELECT ip
   from bad_guys_4)
 group by ip
 order by 2 desc
Michael Pakhantsov
+6  A: 

Sorry to say, but the other answers using just union and not union all are wrong. If there is a selected row that appears in more than one table, it will only be counted in the first table if the other tables are included via union and not union all.

For those queries selecting both the ID and the address, the possibility of a row having the same ID and address in different tables still exists. Using UNION ALL ensures all values are unioned, whether they are duplicates or not - and we want the duplicates so they can be counted. Using UNION ALL is often less work for the database, since it does not need to find duplicates and remove them.

select ip, count(*) from
(
select ip from bad_guys_1
union ALL
select ip from bad_guys_2
union ALL
select ip from bad_guys_3
union ALL
select ip from bad_guys_4
) as ranking
group by ip
order by count(*) DESC
mdma
yes, You're right. Running it with just union gives me a count of 1 for every result, but union all shows me correct total number of times each given ip shows up across all tables.
Andy