I'm trying to solve wordEnds
from codingbat.com using regex.
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i" wordEnds("XY123XY", "XY") → "13" wordEnds("XY1XY", "XY") → "11" wordEnds("XYXY", "XY") → "XY"
This is the simplest as I can make it with my current knowledge of regex:
public String wordEnds(String str, String word) {
return str.replaceAll(
".*?(?=word)(?<=(.|^))word(?=(.|$))|.+"
.replace("word", java.util.regex.Pattern.quote(word)),
"$1$2"
);
}
replace
is used to place in the actual word
string into the pattern for readability. Pattern.quote
isn't necessary to pass their tests, but I think it's required for a proper regex-based solution.
The regex has two major parts:
- If after matching as few characters as possible "
.*?
",word
can still be found "(?=word)
", then lookbehind to capture any character immediately preceding it "(?<=(.|^))
", match "word
", and lookforward to capture any character following it "(?=(.|$))
".- The initial "if" test ensures that the atomic lookbehind captures only if there's a
word
- Using lookahead to capture the following character doesn't consume it, so it can be used as part of further matching
- The initial "if" test ensures that the atomic lookbehind captures only if there's a
- Otherwise match what's left "
|.+
"- Groups 1 and 2 would capture empty strings
I think this works in all cases, but it's obviously quite complex. I'm just wondering if others can suggest a simpler regex to do this.
Note: I'm not looking for a solution using indexOf
and a loop. I want a regex-based replaceAll
solution. I also need a working regex that passes all codingbat tests.
I managed to reduce the occurrence of word
within the pattern to just one.
".+?(?<=(^|.)word)(?=(.?))|.+"
I'm still looking if it's possible to simplify this further, but I also have another question:
- With this latest pattern, I simplified
.|$
to just.?
successfully, but if I similarly tried to simplify^|.
to.?
it doesn't work. Why is that?