tags:

views:

62

answers:

3

I can't see a reason why the Matcher would return a match on the pattern, but split will return a zero length array on the same regex pattern. It should return something -- in this example I'm looking for a return of 2 separate strings containing "param/value".

public class MyClass {

    protected Pattern regEx = "(([a-z])+/{1}([a-z0-9])+/?)*";

    public void someMethod() {
        String qs = "param/value/param/value";
        Matcher matcherParamsRegEx = this.regEx.matcher(qs);
        if (matcherParamsRegEx.matches()) { // This finds a match.
            String[] parameterValues = qs.split(this.regEx.pattern()); // No matches... zero length array.
        }
    }
}
+3  A: 

The pattern can match the entire string. split() doesn't return the match, only what's in between. Since the pattern matches the whole string that only leaves an empty string to return. I think you might be under a misconception as to what split() does.

For example:

String qs = "param/value/param/value";
String pieces = qs.split("/"); 

will return an array of 4 elements: param, value, param, value.

Notice that what you search on ("/") isn't returned.

Your regex is somewhat over-complicated. For one thing you're using {1}, which is unnecessary. Second, when you do ([a-z])+ you will capture exactly one latter (the last one encountered. Compare that to ([a-z]+), which will capture the entire match. Also, you don't even need to capture for this. The pattern can be simplified to:

protected Pattern regEx = Pattern.compile("[a-z]+/([a-z0-9]+/?)*");

Technically this:

protected Pattern regEx = "(([a-z])+/{1}([a-z0-9])+/?)*";

is a compiler error, so what you actually ran versus what you posted could be anything.

cletus
+1  A: 

The problem here is that split splits around matches of your regex. You have two consecutive matches with nothing else in between, so there is nothing left for split to return.

I can't see any way for you to get what you want from that string using split, but if you can use a different delimiter to separate pairs than you do to separate name and value, that will help a lot.

Otherwise, you might split on slashes and take alternating results as names and values, but this is error-prone.

danben
+1  A: 

The regex is matching--if it weren't, you would get a one-element array, that element being the whole original string. You just have the wrong idea about how split() works. On the first match attempt it finds "param/value/" and stores everything preceding that match as the first token: an empty string. The second attempt finds "param/value" and stores whatever lay between it and the first match as the next token: another empty string. The third match attempt fails, so whatever was between the second match and the end of the string becomes the final token: yet another empty string.

Having stored all the tokens, split() iterates through them in reverse, checking for trailing empty tokens. The third token is indeed empty, so it deletes that one. The second one is also empty, so it deletes that one. You see where this is going? You can force split() to preserve trailing empty matches by passing a negative integer as the second argument, but that obviously doesn't do you any good. You need to rethink your problem (whatever it is) in terms of how the regex package actually works.

Alan Moore