A java.util.regex.Matcher
tries to find matches on a region, which defaults to the entire input, but may be explicitly set to a specific subrange.
From the documentation:
A matcher finds matches in a subset of its input called the region. By default, the region contains all of the matcher's input. The region can be modified via the region(int start, int end)
method and queried via the regionStart
and regionEnd
methods. The way that the region boundaries interact with some pattern constructs can be changed. See useAnchoringBounds
and useTransparentBounds
for more details.
Remember that like many methods in Java library classes, the start
index is inclusive but the end
index is exclusive.
Snippet
Here's an example usage:
String text = "012 456 890 234";
Pattern ddd = Pattern.compile("\\d{3}");
Matcher m = ddd.matcher(text).region(3, 12);
while (m.find()) {
System.out.printf("[%s] [%d,%d)%n",
m.group(),
m.start(),
m.end()
);
}
The above prints (as seen on ideone.com):
[456] [4,7)
[890] [8,11)
On anchoring bounds and transparent bounds
As previously mentioned, when you specify a region, you can change the behavior of some pattern constructs depending on what you need.
An anchoring bound makes the boundary of the region match various boundary matchers (^
, $
, etc).
An opaque bound essentially cuts off the rest of the input from lookaheads, lookbehinds, and certain boundary matching constructs. On the other hand, in transparent mode, they are allowed to see characters outside of the region as necessary.
By default, a Matcher
uses both anchoring and opaque bounds. This is applicable to most subregion matching scenarios, but you can set your own combination depending on your need.