tags:

views:

66

answers:

4

This is a generalization of a specific problem I am trying to solve.

Given a string of uniquely identifiable elements separated by a single space:

0 1 2 3 4

Without modifying the string, what regular expression will match this expression when any of the numbers are optional except one and the numbers are in order?

Examples of other valid strings:

2
0 4
1 4
0 1 2 3
0 1 3 4
0 1 2 3 4
A: 

There is no regular expression that will be able to verify the order of a language unless you specify an expression for every combination of the ordered set acceptable. You should look at a pushdown automata solution.

linuxuser27
But the requirement is that the numbers appear in sequence (ascending order, in fact)...
Jonathan Leffler
Good call. Corrected to 'combination' not 'permutation'.
linuxuser27
+3  A: 

Regular expressions might not be the best tool for this. It's just about manageable for the example you gave if you can use lookaheads:

^(?=0? ?1? ?2? ?3? ?4?$)(?:\d )*\d$

Rubular

Explanation:

^                       Match the start of line/string.
(?=0? ?1? ?2? ?3? ?4?$) Look ahead to restrict to 0 to 4 in that order.
(?:\d )*\d              Ensure that the string is alternating digit then space.
$                       Match the end of line/string.

To generalize it to fixed length strings all of equal length makes it slightly more difficult to read. For example for ab ba cd de ef it will look like this:

^(?=(?:ab)? ?(?:ba)? ?(?:cd)? ?(?:de)? ?(?:ef)?$)(?:\w{2} )*\w{2}$

Rubular


But generalizing it to words of varying length gets more messy, for example for zero one two three four you can use this regular expression:

^(?=(?:zero)? ?(?:one)? ?(?:two)? ?(?:three)? ?(?:four)?$)(?! )(?: ?(?:zero|one|two|three|four)(?= |$))+$

Rubular

Mark Byers
I should have explained that it was part of the generalization. The digits are actually just placeholders for unique strings. They could be replaced with whole words:ab ba cd de ef
Sarabjot
@sarabjot: Then you probably don't want to use regular expressions. Do you want me to update my post to include the regular expression so that you can see why it's a bad idea to use a regex for this?
Mark Byers
A: 

Regular expressions aren't powerful enough to solve this for arbitrary input strings. For any set input string, you could obvious build a regular expression by listing all possible matching strings, but that's silly. This problem should be almost trivial in any modern high-level language, so let us know what language you're using.

baddox
+1  A: 

I'm a strong advocate of programmatic generation of regex patterns from meta-patterns (see also Martin Fowler's argument for using composed regex). The technique is very applicable in this situation.

Here's a solution in Java:

static String orderedOptional(String sep, String... items) {
    StringBuilder sb = new StringBuilder();
    for (String item : items) {
        sb.append(
            "(?:item(?:sep(?!$)|$))?"
                .replace("item", item)
                .replace("sep", sep)
        );
    }
    return sb.toString();
}
static String wholeLineNonEmpty(String pattern) {
    return "^(?!$)pattern$".replace("pattern", pattern);
}

Now we have (as seen on ideone.com):

    String PATTERN =
        wholeLineNonEmpty(
            orderedOptional(" ",
                "one", "two", "three")
        );

    String[] tests = {
        "",                 // false
        "onetwo",           // false
        "one two",          // true
        "one two ",         // false
        "two three",        // true
        "three",            // true
        "one three",        // true
        "one two three",    // true
        "three two one"     // false
    };
    for (String test : tests) {
        System.out.println(test.matches(PATTERN));
    }

One can also easily use the orderedOptional meta-pattern with other separator and items, and may also use it without the wholeLineNonEmpty meta-pattern. Here's an example:

    String INCREASING_DIGITS = 
        wholeLineNonEmpty(
            orderedOptional("[,;]",
                "1", "2", "3", "4", "5", "6", "7", "8", "9")
        );
    System.out.println(INCREASING_DIGITS);
    // ^(?!$)(?:1(?:[,;](?!$)|$))?(?:2(?:[,;](?!$)|$))?(?:3(?:[,;](?!$)|$))?
    // (?:4(?:[,;](?!$)|$))?(?:5(?:[,;](?!$)|$))?(?:6(?:[,;](?!$)|$))?
    // (?:7(?:[,;](?!$)|$))?(?:8(?:[,;](?!$)|$))?(?:9(?:[,;](?!$)|$))?$

This pattern accepts e.g. "9", "1;2,4", "2,3" and rejects e.g. "", "4321", "4;3;2;1", "1;" (see on rubular.com). Without a doubt the resulting pattern looks ugly and hard to comprehend, but that's why it was generated programmatically in the first place, using simpler patterns that are much easier to understand and a process that is naturally automated.

polygenelubricants