ansaurus

Question

Answer 1

+1 A:

You should take a look at this post:

http://jcole.us/blog/archives/2007/11/24/on-efficiently-geo-referencing-ips-with-maxmind-geoip-and-mysql-gis/

It has some nice ideas for working with IPs in queries very similar to yours.

Another thing you should try is using a stored function instead of a sub-query. That would simplify your query as follows:

SELECT 
event.id,
event.event_name,
GET_PROVIDER_NAME(event.ip_address) as provider
FROM events

Ike Walker 2010-06-17 14:43:02

+1 That URL doesn't quite address the problem I have but it was interesting (and is actually very relevant to what I'm doing, although I didn't go into that in my question as wanted to keep it simple). Your point about a stored function however, hit the nail on the head.

Iain Collins 2010-06-21 09:32:20

Answer 2

A:

There seems to be no way to achieve what I wanted with a JOIN or Subquery.

To expand on Ike Walker's suggestion of using a stored function, I ended up creating a stored function in MySQL with the following:

DELIMITER //
DROP FUNCTION IF EXISTS get_network_provider //
CREATE FUNCTION get_network_provider(ip_address_number INT) RETURNS VARCHAR(255)
BEGIN
DECLARE network_provider VARCHAR(255);
    SELECT provider_name INTO network_provider FROM network_providers
    WHERE ip_address_number >= network_providers.ip_start
    AND network_providers.provider_name != ""
    ORDER BY provider_name.ip_start DESC LIMIT 1;
RETURN network_provider;
END //

Explanation:

The check to ignore blank names, and using >= & ORDER BY for ip_start rather than BETWEEN ip_start and ip_end is a specific fudge for the two combined network provider databases I'm using, both of which need to be queried in this way.

This approach works well when the query calling the function only needs to return a few hundred results (though it may take a handful of seconds). On queries that return a few thousand results, it may take 2 or 3 minutes. For queries with tens of thousands of results (or more) it's too slow to be practical use.

This was not unexpected from using a stored function like this (i.e. every result returned triggering a separate query) but I did hit a drop in performance sooner than I had expected.

Recommendation:

The upshot of this was that I needed to accept that the data structure is just not suitable for my needs. This had been already pointed out to me by a friend, it just wasn't something I really wanted to hear at the time (because I really wanted to use that specific network_provider DB due to other keys in the table that were useful to me, e.g. for things like geolocation).

If you end up trying to use any of the IP provider DB's (or indeed any other database) that follow a similar dubious data format, then I can only suggest they are just not going to be suitable and it's not worth trying to cobble something together that will work with them as they are.

At the very least you need to reformat the data so that they can be reliably used with a simple BETWEEN statement (no sorting, and no other comparisons) so you can use it with subqueries (or JOINS) - although it's likely an indicator that any data that messed up is probably not all that reliable anyway.

Iain Collins 2010-06-21 10:03:49

ansaurus

tags:

views:

answers:

IP address numbers in MySQL subquery

related questions