I am trying to match inputs like
<foo>
<bar>
#####<foo>
#####<bar>
I tried #{5}?<\w+>
, but it does not match <foo>
and <bar>
.
What's wrong with this pattern, and how can it be fixed?
I am trying to match inputs like
<foo>
<bar>
#####<foo>
#####<bar>
I tried #{5}?<\w+>
, but it does not match <foo>
and <bar>
.
What's wrong with this pattern, and how can it be fixed?
?
for optional vs reluctantThe ?
metacharacter in Java regex (and some other flavors) can have two very different meanings, depending on where it appears. Immediately following a repetition specifier, ?
is a reluctant quantifier instead of "zero-or-one"/"optional" repetition specifier.
Thus, #{5}?
does not mean "optionally match 5 #
". It in fact says "match 5 #
reluctantly". It may not make too much sense to try to match "exactly 5, but as few as possible", but this is in fact what this pattern means.
One way to fix this problem is to group the optional pattern as (…)?
. Something like this should work for this problem:
(#{5})?<\w+>
Now the ?
does not immediately follow a repetition specifier (i.e. *
, +
, ?
, or {…}
); it follows a closing bracket used for grouping.
Alternatively, you can also use a non-capturing group (?:…)
in this case:
(?:#{5})?<\w+>
This achieves the same grouping effect, but doesn't capture into \1
.
java.util.regex.Pattern
: X{n}?
: X, exactly n timesregex{n,}?
== regex{n}
? (absolutely NOT!).*?
and .*
for regex??
It's worth noting that you can use ??
to match an optional item reluctantly!
System.out.println("NOMZ".matches("NOMZ??"));
// "true"
System.out.println(
"NOM NOMZ NOMZZ".replaceAll("NOMZ??", "YUM")
); // "YUM YUMZ YUMZZ"
Note that Z??
is an optional Z
, but it's matched reluctantly. "NOMZ"
in its entirety still matches
the pattern NOMZ??
, but in replaceAll
, NOMZ??
can match only "NOM"
and doesn't have to take the optional Z
even if it's there.
By contrast, NOMZ?
will match the optional Z
greedily: if it's there, it'll take it.
System.out.println(
"NOM NOMZ NOMZZ".replaceAll("NOMZ?", "YUM")
); // "YUM YUM YUMZ"
matches
a pattern against the entire String