views:

85

answers:

5

Okay, this is quite an interesting challenge I have got myself into.

My RegEx takes as input lines like the following:

147.63.23.156/159
94.182.23.55/56
134.56.33.11/12

I need it to output a regular expression that matches the range represented. Let me explain.

For example, if the RegEx receives 147.63.23.156/159, then it needs to output a RegEx that matches the following:

147.63.23.156
147.63.23.157
147.63.23.158
147.63.23.159

How can I do this?

Currently I have:

(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})
  • $1 contains the first xxx.xxx.xxx. part
  • $2 contains the lower range for the number
  • $3 contains the upper range for the number
+1  A: 

If you just need to build them one at at time, this website will do the trick.

If you need code, and don't mind python, this code does it for any arbitrary numeric range.

Robert Harvey
A: 

I absolutely agree with the commenters, a pure-regex solution would be the wrong tool for the job here. Just use the regular expression you already have to extract the prefix, minimum, and maximum values,

$prefix, $minimum, $maximum = match('(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})', $line).groups()

then test your IP address against ${prefix}(\d+),

$lastgroup = match($prefix + '(\d+)', $addr).groups()[0]

and compare that last group to see if it falls within the proper range,

return int($minimum) <= int($lastgroup) <= int($maximum)

Code examples are pseudocode, of course - convert to your language of choice.

David Zaslavsky
Maybe I should have explained the problem a little more - I am creating regular expressions for an Apache `RewriteCond`. They need to match certain IP address ranges - but they must be supplied as a regular expression.
George Edison
@George: that would have been good information to include in the question.
David Zaslavsky
+1  A: 

Regexes are really not a great way to validate IP addresses, I want to make that clear right up front. It is far, far easier to parse the addresses and do some simple arithmetic to compare them. A couple of less than's and greater than's and you're there.

That said, it seemed like it would be a really fun exercise to write this regex generator. And you know what? It was! I came up with a big mess of Python code to generate these regexes. Before I show the code, here's a sample of the regexes it produces for a couple of IP ranges:

1.2.3.4 to 1.2.3.4              1\.2\.3\.4

147.63.23.156 to 147.63.23.159  147\.63\.23\.15[6-9]

10.7.7.10 to 10.7.7.88          10\.7\.7\.([1-7]\d|8[0-8])

127.0.0.0 to 127.0.1.255        127\.0\.[0-1]\.(\d|[1-9]\d|1\d\d|2([0-4]\d|5[0-5]))

I'll show the code in two parts. First, the part that generates regexes for simple integer ranges. Second, the part that handles full IP addresses.

Matching number ranges

The first step is to figure out how to generate a regex that matches an arbitrary integer range, say 12-28 or 0-255.

@Robert Harvey already posted a link to some code that does this, but feh to that! I was well on my way writing code before I saw his link. Besides, it's about the journey not the destination, right?

So I wrote my own implementation of the number range regex generator. Here's an example of the regexes it comes up with:

156 to 159   15[6-9]

1 to 100     [1-9]|[1-9]\d|100

0 to 255     \d|[1-9]\d|1\d\d|2([0-4]\d|5[0-5])

Here's the code. There are numerous comments inline explaining the logic behind it. Overall it relies on a lot of recursion and special casing to try to keep the regexes lean and mean.

import sys, re

def range_regex(lower, upper):
    lower, upper = str(lower), str(upper)

    # Different lengths, for instance 1-100. Combine regex(1-9) and
    # regex(10-100).
    if len(lower) != len(upper):
        return '%s|%s' % (
            range_regex(lower, '9' * len(lower)),
            range_regex(10 ** (len(lower)), upper)
        )

    ll, lr = lower[0], lower[1:]
    ul, ur = upper[0], upper[1:]

    # One digit numbers.
    if lr == '':
        if ll == '0' and ul == '9':
            return '\\d'
        else:
            return '[%s-%s]' % (ll, ul)

    # Same first digit, for instance 12-14. Concatenate "1" and regex(2-4).
    elif ll == ul:
        return ll + sub_range_regex(lr, ur)

    # All zeros to all nines, for instance 100-399. Concatenate regex(1-3)
    # and the appropriate number of \d's.
    elif lr == '0' * len(lr) and ur == '9' * len(ur):
        return range_regex(ll, ul) + '\\d' * len(lr)

    # All zeros on left, for instance 200-649. Combine regex(200-599) and
    # regex(600-649).
    elif lr == '0' * len(lr):
        return '%s|%s' % (
            range_regex(lower, str(int(ul[0]) - 1) + '9' * len(ur)),
            range_regex(ul + '0' * len(ur), upper)
        )

    # All nines on right, for instance 167-499. Combine regex(167-199) and
    # regex(200-499).
    elif ur == '9' * len(ur):
        return '%s|%s' % (
            range_regex(lower, ll + '9' * len(lr)),
            range_regex(str(int(ll[0]) + 1) + '0' * len(lr), upper)
        )

    # First digits are one apart, for instance 12-24. Combine regex(12-19)
    # and regex(20-24).
    elif ord(ul[0]) - ord(ll[0]) == 1:
        return '%s%s|%s%s' % (
            ll, sub_range_regex(lr, '9' * len(lr)),
            ul, sub_range_regex('0' * len(ur), ur)
        )

    # Far apart, uneven numbers, for instance 15-73. Combine regex(15-19),
    # regex(20-69), and regex(70-73).
    else:
        return '%s|%s|%s' % (
            range_regex(lower, ll + '9' * len(lr)),
            range_regex(str(int(ll[0]) + 1) + '0' * len(lr),
                        str(int(ul[0]) - 1) + '9' * len(ur)),
            range_regex(ul + '0' * len(ur), upper)
        )

# Helper function which adds parentheses when needed to sub-regexes.
# Sub-regexes need parentheses if they have pipes that aren't already
# contained within parentheses. For example, "6|8" needs parentheses
# but "1(6|8)" doesn't.
def sub_range_regex(lower, upper):
    orig_regex = range_regex(lower, upper)
    old_regex  = orig_regex

    while True:
        new_regex = re.sub(r'\([^()]*\)', '', old_regex)

        if new_regex == old_regex:
            break
        else:
            old_regex = new_regex
            continue

    if '|' in new_regex:
        return '(' + orig_regex + ')'
    else:
        return orig_regex

Matching IP address ranges

With that capability in place, I then wrpte a very similar-looking IP range function to work with full IP addresses. The code is very similar to the code above except that we're working in base 256 instead of base 10, and the code throws around lists instead of strings.

import sys, re, socket

def ip_range_regex(lower, upper):
    lower = [ord(c) for c in socket.inet_aton(lower)]
    upper = [ord(c) for c in socket.inet_aton(upper)]

    return ip_array_regex(lower, upper)

def ip_array_regex(lower, upper):
    # One octet left.
    if len(lower) == 1:
        return range_regex(lower[0], upper[0])

    # Same first octet.
    if lower[0] == upper[0]:
        return '%s\.%s' % (lower[0], sub_regex(ip_array_regex(lower[1:], upper[1:])))

    # Full subnet.
    elif lower[1:] == [0] * len(lower[1:]) and upper[1:] == [255] * len(upper[1:]):
        return '%s\.%s' % (
            range_regex(lower[0], upper[0]),
            sub_regex(ip_array_regex(lower[1:], upper[1:]))
        )

    # Partial lower subnet.
    elif lower[1:] == [0] * len(lower[1:]):
        return '%s|%s' % (
            ip_array_regex(lower, [upper[0] - 1] + [255] * len(upper[1:])),
            ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
        )

    # Partial upper subnet.
    elif upper[1:] == [255] * len(upper[1:]):
        return '%s|%s' % (
            ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
            ip_array_regex([lower[0] + 1] + [0] * len(lower[1:]), upper)
        )

    # First octets just 1 apart.
    elif upper[0] - lower[0] == 1:
        return '%s|%s' % (
            ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
            ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
        )

    # First octets more than 1 apart.
    else:
        return '%s|%s|%s' % (
            ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
            ip_array_regex([lower[0] + 1] + [0]   * len(lower[1:]),
                           [upper[0] - 1] + [255] * len(upper[1:])),
            ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
        )
John Kugelman
Awesome... sorry it took so long for me to get back to this question.
George Edison
A: 

To my knowledge, this can't be done with straight up regex, but would also need some code behind it. For instance, in PHP you could use the following:

function make_range($ip){
    $regex = '#(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})#';
    if ( preg_match($regex, $ip, $matches) ){
        while($matches[1] <= $matches[2]){
            print "{$matches[0]}.{$matches[1]}";
            $matches[1]++;
        }
    } else {
        exit('not a supported IP range');
    } 
}

For this to work with a RewriteCond, I think some black magic would be in order...

How is this going to be used with RewriteCond, anyways? Do you have several servers and want to just quickly make a .htaccess file easily? If so, then just add that function to a bigger script that takes some arguments and burps out a .htaccess file.

Tim
+1  A: 

If it's for Apache... I haven't tried it, but it might work:

RewriteCond %{REMOTE_ADDR} !<147.63.23.156
RewriteCond %{REMOTE_ADDR} !>147.63.23.159

(Two consecutive RewriteConds are joined by a default logical AND)

Just have to be careful with ranges with differing number of digits (e.g. 95-105 should be broken into 95-99 and 100-105, since it is lexicographic ordering).

Amadan