views:

1517

answers:

7

I have the 4 sources of IP addresses , I want to store them in SQL Server and allow the ranges, that can be categorised by the originating country code, to be maked in an Exclusion list by country.

For this I have 2 tables.

IPAddressRange CountryCode

What I need to know is, if this data was returned to the client then cached for quick querying , what is the best way to store the returned data to query a specific IP address within the ranges. I want to know if the supplied IP address is in the list.

The reason the list is in the db is for easy storage.

The reason I want to cache then use the data on the client is that I have heard that searching IP addresses is faster in a trie structure. So , I am think I need to get the list from the db , store in cache in a structure that is very quick to search.

Any help in the A) The SQL stucture to store the addresses and b) Code to search the IP addresses.

I know of a code project solution which has a code algorithm for searching not sure how to mix this with the storage aspect.

Ideally without the use of a third party library. The code must be on our own server.

A: 

An IPv4 address can be stored as a four-byte unsigned integer (an uint in C#). An IPv6 address can be an eight-byte unsigned integer (an ulong in C#). Create columns of the appropriate width in SQL, then retrieve and store them in variables. You then use simple integer math to check for the ranges you want, assuming that the ranges are actually contiguous.

A more elaborate solution would be to create an IPAddress class that gives you access to the more familiar dotted-quad structure, but under the covers it would do the exact same thing that you have here.

JSBangs
you need 16 bytes for an IPv6 address, not 8.
Alnitak
A: 

I have never attempted this, so take my answer with a grain of salt, but I think a trie isn't actually what you want unless you intend to store every single IP you want to block (as opposed to ranges or subnets/masks). I think a btree would be better suited, in which case, just go ahead and use your regular database (many databases are implemented with btrees or equally good data structures). I'd store each of the 4 bytes of the IP in a separate column to aide in searching by class A/B/C subnets with "don't care" values equal to NULL, but there's no reason why you couldn't store it as a single 32 bit integer column and crunch the numbers to figure out what range it should fall into (storing masked-out values would be marginally more tricky in this case).

rmeador
There's no such thing as A/B/C subnets these days.
Alnitak
+1  A: 

Assuming your IP Addresses are IPV4, you could just store them in an integer field. Create 2 fields, one for the lower bound for the range, and another for the upper bound. Then make sure these to fields are indexed. When searching for values, just search where the value is greater than or equal to the lower bound, and less than or equal to the upper bound. I would experiment with something simple like this before trying to program something more complicated yourself, which doesn't actually give noticeably quicker results.

Kibbee
I am not convinced that a database call for , potentially, every page request (lets say 1-5K hits per sec) is going to perform very well compared to a in cache list of excluded IPs using a more tuned algorithm for finding the IP. I just don't know of the best way to do it.
Coolcoder
I've tried this, it's wayyy too slow.
Mauricio Scheffer
+1  A: 

I've done a filter by country exactly like you describe.

However, after experimenting a while, I found out that it can't be done in a performant way with SQL. That's why IP databases like this one (the one I'm using) offer a binary database, which is much faster because it's optimized for this kind of data.

They even say explicitly:

Note that queries made against the CSV data imported into a SQL database can take up to a few seconds. If performance is an issue, the binary format is much faster, and can handle thousands of lookups per second.

Plus, they even give you the code to query this database.

I'm using this in a production website with medium traffic, filtering every request, with no performance problems.

Mauricio Scheffer
We are kind of restricted to using the official raw data sources.
Coolcoder
Check out what they say at MaxMind: "Note that queries made against the CSV data imported into a SQL database can take up to a few seconds. If performance is an issue, the binary format is much faster, and can handle thousands of lookups per second".
Mauricio Scheffer
I'd try to make an exception to the policies for this one...
Mauricio Scheffer
There is a code project article on using the raw data and an algorithm which produces around 500K searches per second. This would be fine and I wouldnt be reliant on a third party company. However, I need the best way to store the data to pull into cache to use this algorithm.
Coolcoder
And perhaps change the algorithm to use cached data and not load the raw files directly.
Coolcoder
Suit yourself, but IMHO this is a big NIH...
Mauricio Scheffer
BTW, MaxMind's code has an option to cache the data.
Mauricio Scheffer
We just wouldnt be allowed to use MaxMind's solution or even IptoCountry's effort.
Coolcoder
A: 

An IPv6 address can be an eight-byte unsigned integer (an ulong in C#)

IPv6 addresses are 128-bit (16 byte) not 8 as suggested. I am grappling with this very problem right now for IP ranges.

I am looking to try padded or hex strings and just do < and > comparisons

Verdant
A: 

You can efficiently do it provided you store your IPv4 start addresses in the right data type. A varchar (or other string type) is not right - you need to use an int.

For IPv4, store the IP number in an unsigned in which is big enough, then store it as a INET_ATON format (which is easy enough to generate; I'm not sure how in C# but it ain't difficult).

You can then easily and efficiently look up which range an IP address is part of by arranging for the database to do a range scan.

By using LIMIT (or SELECT TOP 1 in MSSQL) you can have it stop once it finds a record.

SELECT TOP 1 networkidorwhatever, IPNumber, IPNumberUpperBoundOrWhateverYouCallIt 
FROM networks 
WHERE IPNumber <= IPNUMBERTOQUERY ORDER BY IPNumber DESC

Should find the highest numbered network number which is <= the IP number, then it's a trivial check to determine whether that IP address is within it.

It should be efficient provided there is a conventional index on IPNumber.

For IPv6 the types are different but the principle is the same.

MarkR
A: 

For IPv4 normally a DBA would recommend 4 tinyint fields but you're doing ranges, which lend itself more to the integer storage solutions previously provided. In that case you would store a beginning IP address and an ending IP address for the range. Then it's a simple matter to do the comparison.

K. Brian Kelley