You seem to expect (at \w+)+
to match both at Boston
and at Downtown
in the first string. That doesn't work because you don't allow for the space before the second at
. You would need to change it to ( at \w+)+
--or better, change that to a non-capturing group and use the capturing group for the part that really interests you:
Pattern p = Pattern.compile(".*?(?: at (\\w+))+.*");
String s1 = "I am at Boston at Downtown";
Matcher m = p.matcher(s1);
if (m.matches()) {
System.out.println(m.group(1));
}
But now it only prints Downtown
. That's because you're trying to use one capturing group to capture two substrings. The first time (?: at (\\w+))+
matches, it captures Boston
; the second time, it discards Boston
and captures Downtown
instead.
There are some regex flavors that will let you retrieve intermediate captures (Boston
in this example), but Java is not one of them. Your best option is probably to use find()
instead of matches()
, as @arclight suggested. That makes the regex simpler, too:
Pattern p = Pattern.compile("\\bat\\s+(\\w+)");
String s1 = "I am at Boston at Downtown";
Matcher m = p.matcher(s1);
while (m.find()) {
System.out.println(m.group(1));
}
You don't have to match the space before at
any more, but you probably want to use the \b
(word boundary) to avoid partial-word matches (e.g., My cat is at Boston at Downtown). And it's usually a good idea to use \s+
instead of a literal space, in case there are multiple spaces, or the space is really a TAB or something.