ansaurus

Question

How to store and search for an IP Address

Answer 1

A:

An IPv4 address can be stored as a four-byte unsigned integer (an uint in C#). An IPv6 address can be an eight-byte unsigned integer (an ulong in C#). Create columns of the appropriate width in SQL, then retrieve and store them in variables. You then use simple integer math to check for the ranges you want, assuming that the ranges are actually contiguous.

A more elaborate solution would be to create an IPAddress class that gives you access to the more familiar dotted-quad structure, but under the covers it would do the exact same thing that you have here.

JSBangs 2009-01-19 20:44:36

you need 16 bytes for an IPv6 address, not 8.

Alnitak 2009-01-19 20:50:39

Answer 2

A:

I have never attempted this, so take my answer with a grain of salt, but I think a trie isn't actually what you want unless you intend to store every single IP you want to block (as opposed to ranges or subnets/masks). I think a btree would be better suited, in which case, just go ahead and use your regular database (many databases are implemented with btrees or equally good data structures). I'd store each of the 4 bytes of the IP in a separate column to aide in searching by class A/B/C subnets with "don't care" values equal to NULL, but there's no reason why you couldn't store it as a single 32 bit integer column and crunch the numbers to figure out what range it should fall into (storing masked-out values would be marginally more tricky in this case).

rmeador 2009-01-19 20:45:10

There's no such thing as A/B/C subnets these days.

Alnitak 2009-01-19 20:58:33

Answer 3

+1 A:

Assuming your IP Addresses are IPV4, you could just store them in an integer field. Create 2 fields, one for the lower bound for the range, and another for the upper bound. Then make sure these to fields are indexed. When searching for values, just search where the value is greater than or equal to the lower bound, and less than or equal to the upper bound. I would experiment with something simple like this before trying to program something more complicated yourself, which doesn't actually give noticeably quicker results.

Kibbee 2009-01-19 20:45:31

I am not convinced that a database call for , potentially, every page request (lets say 1-5K hits per sec) is going to perform very well compared to a in cache list of excluded IPs using a more tuned algorithm for finding the IP. I just don't know of the best way to do it.

Coolcoder 2009-01-19 21:26:11

I've tried this, it's wayyy too slow.

Mauricio Scheffer 2009-01-19 21:51:36

Answer 4

+1 A:

I've done a filter by country exactly like you describe.

However, after experimenting a while, I found out that it can't be done in a performant way with SQL. That's why IP databases like this one (the one I'm using) offer a binary database, which is much faster because it's optimized for this kind of data.

They even say explicitly:

Note that queries made against the CSV data imported into a SQL database can take up to a few seconds. If performance is an issue, the binary format is much faster, and can handle thousands of lookups per second.

Plus, they even give you the code to query this database.

I'm using this in a production website with medium traffic, filtering every request, with no performance problems.

Mauricio Scheffer 2009-01-19 21:13:36

We are kind of restricted to using the official raw data sources.

Coolcoder 2009-01-19 21:23:52

Check out what they say at MaxMind: "Note that queries made against the CSV data imported into a SQL database can take up to a few seconds. If performance is an issue, the binary format is much faster, and can handle thousands of lookups per second".

Mauricio Scheffer 2009-01-19 21:28:30

I'd try to make an exception to the policies for this one...

Mauricio Scheffer 2009-01-19 21:29:18

There is a code project article on using the raw data and an algorithm which produces around 500K searches per second. This would be fine and I wouldnt be reliant on a third party company. However, I need the best way to store the data to pull into cache to use this algorithm.

Coolcoder 2009-01-19 21:33:35

And perhaps change the algorithm to use cached data and not load the raw files directly.

Coolcoder 2009-01-19 21:34:53

Suit yourself, but IMHO this is a big NIH...

Mauricio Scheffer 2009-01-19 21:35:34

BTW, MaxMind's code has an option to cache the data.

Mauricio Scheffer 2009-01-19 21:36:20

We just wouldnt be allowed to use MaxMind's solution or even IptoCountry's effort.

Coolcoder 2009-01-19 21:47:17

Answer 5

A:

An IPv6 address can be an eight-byte unsigned integer (an ulong in C#)

IPv6 addresses are 128-bit (16 byte) not 8 as suggested. I am grappling with this very problem right now for IP ranges.

I am looking to try padded or hex strings and just do < and > comparisons

Verdant 2009-01-19 21:31:21

Answer 6

A:

You can efficiently do it provided you store your IPv4 start addresses in the right data type. A varchar (or other string type) is not right - you need to use an int.

For IPv4, store the IP number in an unsigned in which is big enough, then store it as a INET_ATON format (which is easy enough to generate; I'm not sure how in C# but it ain't difficult).

You can then easily and efficiently look up which range an IP address is part of by arranging for the database to do a range scan.

By using LIMIT (or SELECT TOP 1 in MSSQL) you can have it stop once it finds a record.

SELECT TOP 1 networkidorwhatever, IPNumber, IPNumberUpperBoundOrWhateverYouCallIt 
FROM networks 
WHERE IPNumber <= IPNUMBERTOQUERY ORDER BY IPNumber DESC

Should find the highest numbered network number which is <= the IP number, then it's a trivial check to determine whether that IP address is within it.

It should be efficient provided there is a conventional index on IPNumber.

For IPv6 the types are different but the principle is the same.

MarkR 2009-01-19 21:49:38

Answer 7

A:

For IPv4 normally a DBA would recommend 4 tinyint fields but you're doing ranges, which lend itself more to the integer storage solutions previously provided. In that case you would store a beginning IP address and an ending IP address for the range. Then it's a simple matter to do the comparison.

K. Brian Kelley 2009-01-20 02:36:18

ansaurus

tags:

views:

answers:

How to store and search for an IP Address

related questions