ansaurus

Question

Is there an elegant way to do partial regex matches in Java?

Answer 1

+4 A:

Although there may be some trickery available, your way is probably the best semantically. It accurately describes what you're are looking for.

However, the bigger issue is whether you really need to validate every single time a character is typed into the text box. Why can't you just validate it once at the end and save yourself some headaches?

Pesto 2009-09-29 17:33:26

"Why", indeed... :( Because The Client (tm) decrees it so. The problem with the way I'm doing it, apart from the fact that's it's pretty unfriendly (I'd have to explain to the users of the API about the weird partially-matching regexps, etc.), is that it won't match strings that match the END of the regex. In the example above, it won't match "3c", which should be valid, since you can always go back and add "ab" at the beginning.

Tonio 2009-09-29 17:37:09

Did the client really specify that it had to be done by a regex? Or is did that particular part of the design come from the technical side?

Yishai 2009-09-29 17:51:08

Client did not specify that it had to be done by a regex, that was our design decision. Originally it was done by simple string matching, but we eventually decided to use regexps since some patterns could get pretty complex.

Tonio 2009-09-29 18:43:16

Answer 2

+2 A:

Here is a regex that can solve your particular example:

^(?:a|b|[0-9]|c|ab|b[0-9]|[0-9]c|ab[0-9]|b[0-9]c|ab[0-9]c)?$

Generally speaking, if you can break the regex down into atomic parts, you can OR together all possible groupings of them, but it is big and ugly. In this case, there were 4 parts (a, b, [0-9], and c), so you had to OR together 4+3+2+1=10 possibilities. (For n parts, it is (n×(n+1))/2 possibilities). You might be able to generate this algorithmically, but it would be a huge pain to test. And anything complex (like a subgroup) would be very difficult to get right.

A better solution is probably just to have a message beside the input field telling the user "not enough info" or something, and when they have it right change it to a green checkbox or something. Here's a recent article from A List Apart that weighs the pros and cons of different approaches to this problem: Inline Validation in Web Forms.

Kip 2009-09-29 17:45:02

This is precisely what I'm trying to avoid... ungainly regexps (OR'ed in your example, containing optional subgroups at the end in my example). :(Unfortunately, your proposed better solution is unacceptable by the client. What they require is blocking text entry of illegal characters (illegal meaning characters which don't lead to a valid string), and visual feedback of a fully valid string (meaning, as implemented, the background of the text box changes color when the string matches the regex entirely).

Tonio 2009-09-29 18:26:01

Well maybe you could have two steps? one which just runs `ab[0-9]c` and tells whether the full string is valid, and one which runs the big regex to tell if what they have entered *could* be valid. You can run the big regex on keyPressed event, and if that fails, you return false (i.e. don't let the user enter that character)

Kip 2009-09-29 19:41:27

If I could generate these regexps algorithmically, it would be a valid solution. Generating them for simple regexps shouldn't be too much of a problem, but unfortunately not all regexps being used are simple.

Tonio 2009-09-29 20:38:54

Answer 3

+4 A:

Is Matcher.hitEnd() what you're looking for?

Pattern thePattern = Pattern.compile(theRegexString);
Matcher m = thePattern.matcher(theStringToTest);
if (m.matches()) {
    return true;
}
return m.hitEnd();

Éric Malenfant 2009-09-29 17:46:57

Nice! It almost works. It certainly works as a replacement of what I'm currently doing. It still won't work for partial tail matches (for example, "b3c" being a partial tail match for `ab[0-9]c`), but my current solution doesn't handle those either.

Tonio 2009-09-29 18:40:30

It's true that `hitEnd()` serves the same purpose as the OP's own solution of breaking the regex into successive optional groups. But there's still no way to detect a partial match that doesn't align with the beginning of the regex.

Alan Moore 2009-09-29 19:04:24

ansaurus

tags:

views:

answers:

Is there an elegant way to do partial regex matches in Java?

related questions