tags:

views:

36

answers:

2

Hi there!

Is there a nice way to evaluate a regular expression range, say, for a url such as

http://example.com/[a-z]/[0-9].htm

This would be converted into:

http://example.com/a/0.htm
http://example.com/a/1.htm
http://example.com/a/2.htm
...
http://example.com/a/9.htm
...
http://example.com/z/0.htm
http://example.com/z/1.htm
http://example.com/z/2.htm
...
http://example.com/z/9.htm

I've been scratching my head about this, and there's no pretty way of doing it without going through the alphabet and looping through numbers.

Thanks in advance!

+2  A: 

I guess there is no way to expand regular expressions in general. Your example

http://foo.com/[a-z]/[0-9].htm

is a very easy regex without * or + for instance. How would you expand such a regex?

In your case you might get away with some loops, but as I said - this is a untypical (easy) regex.

tanascius
No, I would say that the regex ranges I'm looking to create are reasonably simple alphanumeric ones but, yes, I agree that other ranges would be *quite* hard to do.
Dan Atkinson
What for do you need it? Some kind of tests?
tanascius
I suppose that simple url ranges would probably be my primary use. There is no real *need* for it. If anything, it's more of a thought experiment. :)
Dan Atkinson
Just an idea: You could **brute-force** the regex. For every `[]` you find you assume a character set (like a-z, A-z, 0-9, and a few special chars) and use all combinations as an input. However the performance would be poor and the miss rate could be high (if you forget any valid characters).
tanascius
Yeah, that's what I'm thinking. I've started by regexing the er... regex and then parse it, determining if it's a string or int range. That's about as simple as I can get it right now. :)
Dan Atkinson
@Dan, If you tried to use the approach of "regexing the regex" with more complex regexes, you might see that you cannot do it fully: The language of regular expressions is not regular itself, e.g. because it has nested groups (think about expressions like `(a(bc)?)+`). But simple regexes of the form `([^][]*\\[[^]]-[^]]\\])*` you could split by regexes and handle with loops and recursion.
Christian Semrau
+2  A: 

If you really need to do this, it's not that hard to generate the strings using recursion. Here's a snippet to do just that in Java:

public class Explode {
    static void dfs(String prefix, String suffix) {
        final int k = suffix.indexOf('[');
        if (k == -1) {
            System.out.println(prefix + suffix);
        } else {
            prefix += suffix.substring(0, k);
            char from = suffix.charAt(k+1);
            char to = suffix.charAt(k+3);
            suffix = suffix.substring(k+5);
            for (char ch = from; ch <= to; ch++) {
                dfs(prefix + ch, suffix);               
            }
        }
    }
    public static void main(String[] args) {
        String template = "http://example.com/[a-c]/[0-2][x-z].htm";
        dfs("", template);
    }
}

(see full output)

This is a standard recursive tuple generator, but with some string infixing in between. It's trivial to port to C#. You'd want to use a mutable StringBuilder-like class for better performance.

polygenelubricants