tags:

views:

214

answers:

3

What I need is to check whether a given string partially matches a given regex. For example, for the regex ab[0-9]c, the strings "a", "ab", "ab3", and "b3c" would "match", but not the strings "d", "abc", or "a3c". What I've been doing is the clunky a(?:b(?:[0-9](?:c)?)?)? (which only works for some of the partial matches, specifically those which "begin" to match), but since this is part of an API, I'd rather give the users a more intuitive way of entering their matching regexps.

In case the description's not very clear (and I realize it might not be!), this will be used for validating text input on text boxes. I want to prevent any editing that would result in an invalid string, but I can't just match the string against a regular regex, since until it's fully entered, it would not match. For example, using the regex above (ab[0-9]c), when I attempt to enter 'a', it's disallowed, since the string "a" does not match the regex.

Basically, it's a sort of reverse startsWith() which works on regexps. (new Pattern("ab[0-9]c").startsWith("ab3") should return true.)

Any ideas?

+4  A: 

Although there may be some trickery available, your way is probably the best semantically. It accurately describes what you're are looking for.

However, the bigger issue is whether you really need to validate every single time a character is typed into the text box. Why can't you just validate it once at the end and save yourself some headaches?

Pesto
"Why", indeed... :( Because The Client (tm) decrees it so. The problem with the way I'm doing it, apart from the fact that's it's pretty unfriendly (I'd have to explain to the users of the API about the weird partially-matching regexps, etc.), is that it won't match strings that match the END of the regex. In the example above, it won't match "3c", which should be valid, since you can always go back and add "ab" at the beginning.
Tonio
Did the client really specify that it had to be done by a regex? Or is did that particular part of the design come from the technical side?
Yishai
Client did not specify that it had to be done by a regex, that was our design decision. Originally it was done by simple string matching, but we eventually decided to use regexps since some patterns could get pretty complex.
Tonio
+2  A: 

Here is a regex that can solve your particular example:

^(?:a|b|[0-9]|c|ab|b[0-9]|[0-9]c|ab[0-9]|b[0-9]c|ab[0-9]c)?$

Generally speaking, if you can break the regex down into atomic parts, you can OR together all possible groupings of them, but it is big and ugly. In this case, there were 4 parts (a, b, [0-9], and c), so you had to OR together 4+3+2+1=10 possibilities. (For n parts, it is (n×(n+1))/2 possibilities). You might be able to generate this algorithmically, but it would be a huge pain to test. And anything complex (like a subgroup) would be very difficult to get right.

A better solution is probably just to have a message beside the input field telling the user "not enough info" or something, and when they have it right change it to a green checkbox or something. Here's a recent article from A List Apart that weighs the pros and cons of different approaches to this problem: Inline Validation in Web Forms.

Kip
This is precisely what I'm trying to avoid... ungainly regexps (OR'ed in your example, containing optional subgroups at the end in my example). :(Unfortunately, your proposed better solution is unacceptable by the client. What they require is blocking text entry of illegal characters (illegal meaning characters which don't lead to a valid string), and visual feedback of a fully valid string (meaning, as implemented, the background of the text box changes color when the string matches the regex entirely).
Tonio
Well maybe you could have two steps? one which just runs `ab[0-9]c` and tells whether the full string is valid, and one which runs the big regex to tell if what they have entered *could* be valid. You can run the big regex on keyPressed event, and if that fails, you return false (i.e. don't let the user enter that character)
Kip
If I could generate these regexps algorithmically, it would be a valid solution. Generating them for simple regexps shouldn't be too much of a problem, but unfortunately not all regexps being used are simple.
Tonio
+4  A: 

Is Matcher.hitEnd() what you're looking for?

Pattern thePattern = Pattern.compile(theRegexString);
Matcher m = thePattern.matcher(theStringToTest);
if (m.matches()) {
    return true;
}
return m.hitEnd();
Éric Malenfant
Nice! It almost works. It certainly works as a replacement of what I'm currently doing. It still won't work for partial tail matches (for example, "b3c" being a partial tail match for `ab[0-9]c`), but my current solution doesn't handle those either.
Tonio
It's true that `hitEnd()` serves the same purpose as the OP's own solution of breaking the regex into successive optional groups. But there's still no way to detect a partial match that doesn't align with the beginning of the regex.
Alan Moore