views:

52

answers:

2

I would like to resolve IP(v4) addresses to owner organizations, from the registry of IP address allocations. To do it, I don't want to become an expert in whois protocols and templates or the structure of the registries themselves. I just want a function that takes an IP address (allocated anywhere in the world) and returns a short string like "IBM Corporation". The same thing I would find by typing "whois n.n.n.n" and eyeballing the result. Reverse DNS is not what I want. Should be free software and run on Linux.

Incredibly to me, I can't find this. The whois program (on Debian) and other user-oriented front-ends give me a result for any IP address, but in all sorts of raw formats. I've found whois libraries that parse results, but they seem to assume I'm a whois expert and know which registry has the records for my query. I think the pieces just need to be put together, but nobody seems to have done it. Have I missed something, or is it easier than I think?

As a bonus, I would like to maintain a cache of these lookups. The cache should store the network range for whois results so that it returns a hit for another IP address in the same network. Ideally, the cache should perform better than a linear search as it grows.

The purpose? I would find this incredibly helpful for analyzing server logs. Reverse DNS is mostly useless thse days, but I would still like some idea of who's responsibly for requests.

+1  A: 

There is no real set format for whois information. You will have to parse through the data and make guesses. I suggest looking for OrgName:, Organisation:, Organization:, and there are probably plenty of others.

If you are just doing this for your own sites, I recommend using an Analytics package to do this work for you. Google Analytics is great but does not analyze your web server's logs. You would have to use something like Web Trends.

Brad
It's a shame to me that there isn't a community effort to write and collect the parsers for the different formats. :-( My need is more complex than simple web logs, and I don't even know how something like Web Trends would integrate with what I'm doing (not to mention I have neither the inclination nor the budget for a commercial package.) Thanks for the ideas.
Andrew
Then I recommend simply starting by parsing your own. I doubt the list of possible formats *that* extensive. Knock 'em out one by one until you cover all of the formats that you can find.
Brad
A: 

As Brad correctly pointed out in his answer, there is no standard, no way to detect the same information for all responses.

You need to create one parser for each response format, and it requires a really huge effort.

One year ago I started the project of creating a pure-ruby WHOIS client and parser. The library is open-source, so feel free to fork it and contribute back.

Currently it provides more than 150 different parsers. Not all parsers support the Organization information, but the library has a very flexible DSL so you can easily add it.

Simone Carletti
Great, I will check this out!
Andrew