ansaurus

Question

Java - Regex problem

Answer 1

+2 A:

If you do want to use a regex (despite that it's doable w/o regex)

String s = /* ... your string here ... */
String parenReplacement = "!!!" // whatever the replacement is
Pattern p = Pattern.compile("^(.*)\\)([^\\)]*)$");
Matcher m = p.matcher(s);
if (m.find())
{
   s = m.group(1)+parenReplacement+m.group(2);
}

Jason S 2010-04-15 15:45:01

Why do you instantiate a `StringBuffer` and not use it?

Ben S 2010-04-15 15:58:44

See my answer for a simpler solution using `replaceFirst`.

polygenelubricants 2010-04-15 16:46:50

@Ben S: thanks, I had used it, forgot to delete

Jason S 2010-04-15 18:09:34

Answer 2

+1 A:

Why would you use a regex for that? Just use String.charAt(...) and substring(...)!

Guillaume 2010-04-15 15:46:38

Answer 3

A:

You don't really need a regex for this. The String class has a lastIndexOf() method that you can use to find the index of the last ) in the String. See here.

Syntactic 2010-04-15 15:47:57

Answer 4

+7 A:

Please don't use a regex for this simple task.

// If the last ) might not be the last character of the String
String s = "Your String with) multiple).";
StringBuilder sb = new StringBuilder(s);
sb.deleteCharAt(s.lastIndexOf(')'));
s = sb.toString(); // s = "Your String with) multiple."

// If the last ) will always be the last character of the String
s = "Your String with))";
if (s.endsWith(")")) 
    s = s.substring(0, s.length() - 1);
// s = "Your String with)"

Ben S 2010-04-15 15:49:44

What do you have against regex for this task? See my answer for a readable and understandable regex (if you have basic knowledge of it, at least).

polygenelubricants 2010-04-15 16:43:51

@polygenelubricants: I do not have regex experience whatsoever, but isn't using regex costly and should be avoided? I was under the impression that if you can go a non-regex route, its suggested to do so.

Anthony Forloney 2010-04-15 18:12:11

@polygenelubricants: I'm just against using the wrong tool for the job. Regular expressions are overkill for this problem and are much less legible than simple string manipulation. Even to someone that knows regular expressions well, `str.replaceFirst("(.*)\\)", "$1");` is much less intuitive than `sb.deleteCharAt(s.lastIndexOf(')'));` Calling a `replaceFirst` method that's been hacked to actually replace last is just confusing.

Ben S 2010-04-15 18:36:00

@Anthony: It's true that a regex solution can never be quite as efficient as a well-written non-regex solution, but that point tends to get blown way out of proportion. Performance-wise, regexes are more than adequate for most applications.

Alan Moore 2010-04-15 19:25:31

@Ben S: "Calling a replaceFirst method that's been hacked to actually replace last is just confusing." -- ... and you know what? My regex actually works if you call `replaceAll` instead! With no modification to the pattern whatsoever! I actually deliberately used `replaceFirst` because of my twisted sense of humor, but really either of the regex-based replace works! I'd like to think that this is a strong argument for the claim that the pattern really is _that_ simple, but of course yes, you do have to have basic understanding of regex, and most people simply can't be bothered to learn.

polygenelubricants 2010-04-15 22:45:13

To *this* person who knows regexes well, @poly's `replaceFirst` solution seems perfectly straightforward, and your `deleteCharAt`/`lastIndexOf` solution looks like a hideous waste of space. Ultimately, it comes down to personal preference, but the OP *did* ask for a regex solution.

Alan Moore 2010-04-16 03:56:27

Answer 5

+2 A:

If only ) at the end of the string is to be removed, then this works:

str.replaceFirst("\\)$", "");

This matches exactly what it says: a literal ) (escaped because it's also a regex metacharacter) followed by $, the end-of-string boundary anchor, and replace it with the empty string, effectively deleting any terminating ).

If there is no match, it means that there is no ) at the end of the string (even though there may be occurrences elsewhere), and there is no replacement made and the string is unchanged.

If you generally want to remove the last occurrence of ) which may not be at the end of the string, you can use greedy .* matching:

str.replaceFirst("(.*)\\)", "$1");

Here we have greedy matching .* that captures into \1. If the whole pattern ever matches, \1 would've been as long as it possibly can, which means that the literal ) following it would've had to have been the last occurrence (because if there is another occurrence to its right, \1 could've captured a longer string instead, which is a contradiction).

Performance

Matching the first regex should be optimizable to a O(1) operation, thanks to the end-of-string $ anchor. The actual replacement will be O(N), because the new string would have to be copied to a new buffer if there is a match. If there is no match, then it should be optimizable to return the original string, and therefore would've been O(1) overall. This is as optimal as it gets.

The second regex needs O(N) to match because of the repetition. This is no worse than a linear search for the last ) using lastIndexOf, which is also O(N).

If you're doing this a lot, then you should know the standard compiled Pattern equivalence of replaceFirst. From the API:

An invocation of this method of the form
str.replaceFirst(regex, repl)
yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceFirst(repl)

Readability

"Calling a replaceFirst method that's been hacked to actually replace last is just confusing."

It should be pointed here that in fact, you can use replaceAll with these exact patterns and the solution would still work! Really you just need a regex replace, and either of replaceAll or replaceFirst it really doesn't matter, the pattern is really that simple!

The needle$ to match at the end of the string and the greedy (.*)needle to match the last occurrence are basic idioms that is very readable and understandable to those who have basic understanding of regex. Neither would really qualify as "hacks".

Using a method called replaceFirst to replace the last occurrence of something may seem misleading at first, but this is shortsighted: it is the first match of the pattern that is replaced; what that pattern matches can be anything, be it the sixth "Sense", or the last "Mohican"!

As an analogy, let's take another simple string manipulation example: delete all "spam" substring from a string. I would argue that the most readable solution is to use replace

str.replace("spam", "");

"But wait! The name replace is misleading! You're not replacing it with something else! You should call a method called delete or something!"

That's silly-talk, of course! You are indeed replacing it with something else -- the empty string! Its effect is deletion, but the operation is still string replace-ment!

Just like the replaceFirst in my solution: you may want to replace the last occurrence of something, but it's still a first match of the overall pattern!

Now it's true that a regex pattern out of nowhere will be confusing, but it can be clear from context, e.g:

public static String removeLastCloseParenthesis(String str) {
   return str.replaceFirst("(.*)\\)", "$1");
}

And you can always just name the thing. And you can always put comments as/if necessary. These are just general code readability techniques, and therefore applicable to regex just as they do to everything else.

polygenelubricants 2010-04-15 16:21:05

`replaceAll` isn't much better than `replaceFirst`, in both instances it's misleading. No matter how simple the regex is (and I agree, it doesn't get much simpler), it's just not the correct tool for this problem.

Ben S 2010-04-15 23:37:49

`replaceAll` and `replaceFirst` are the _only_ options available from `String` class for regex replacement. I don't think they're misleading: it replaces all/first match of a _pattern_. What that pattern matches can be anything (e.g. "third `X`").

polygenelubricants 2010-04-16 00:14:36

ansaurus

tags:

views:

answers:

Java - Regex problem

Performance

Readability

related questions