tags:

views:

195

answers:

5

I want to remove ) character from the end of a string through a regex.

E.g If a string is UK(Great Britain) then I want to replace the last ) symbol.

Note:

1). The regex should remove only the last ) symbol, doesn't matter how many ) symbols are present in the string.

+2  A: 

If you do want to use a regex (despite that it's doable w/o regex)

String s = /* ... your string here ... */
String parenReplacement = "!!!" // whatever the replacement is
Pattern p = Pattern.compile("^(.*)\\)([^\\)]*)$");
Matcher m = p.matcher(s);
if (m.find())
{
   s = m.group(1)+parenReplacement+m.group(2);
}
Jason S
Why do you instantiate a `StringBuffer` and not use it?
Ben S
See my answer for a simpler solution using `replaceFirst`.
polygenelubricants
@Ben S: thanks, I had used it, forgot to delete
Jason S
+1  A: 

Why would you use a regex for that? Just use String.charAt(...) and substring(...)!

Guillaume
A: 

You don't really need a regex for this. The String class has a lastIndexOf() method that you can use to find the index of the last ) in the String. See here.

Syntactic
+7  A: 

Please don't use a regex for this simple task.

// If the last ) might not be the last character of the String
String s = "Your String with) multiple).";
StringBuilder sb = new StringBuilder(s);
sb.deleteCharAt(s.lastIndexOf(')'));
s = sb.toString(); // s = "Your String with) multiple."

// If the last ) will always be the last character of the String
s = "Your String with))";
if (s.endsWith(")")) 
    s = s.substring(0, s.length() - 1);
// s = "Your String with)"
Ben S
What do you have against regex for this task? See my answer for a readable and understandable regex (if you have basic knowledge of it, at least).
polygenelubricants
@polygenelubricants: I do not have regex experience whatsoever, but isn't using regex costly and should be avoided? I was under the impression that if you can go a non-regex route, its suggested to do so.
Anthony Forloney
@polygenelubricants: I'm just against using the wrong tool for the job. Regular expressions are overkill for this problem and are much less legible than simple string manipulation. Even to someone that knows regular expressions well, `str.replaceFirst("(.*)\\)", "$1");` is much less intuitive than `sb.deleteCharAt(s.lastIndexOf(')'));` Calling a `replaceFirst` method that's been hacked to actually replace last is just confusing.
Ben S
@Anthony: It's true that a regex solution can never be quite as efficient as a well-written non-regex solution, but that point tends to get blown way out of proportion. Performance-wise, regexes are more than adequate for most applications.
Alan Moore
@Ben S: "Calling a replaceFirst method that's been hacked to actually replace last is just confusing." -- ... and you know what? My regex actually works if you call `replaceAll` instead! With no modification to the pattern whatsoever! I actually deliberately used `replaceFirst` because of my twisted sense of humor, but really either of the regex-based replace works! I'd like to think that this is a strong argument for the claim that the pattern really is _that_ simple, but of course yes, you do have to have basic understanding of regex, and most people simply can't be bothered to learn.
polygenelubricants
To *this* person who knows regexes well, @poly's `replaceFirst` solution seems perfectly straightforward, and your `deleteCharAt`/`lastIndexOf` solution looks like a hideous waste of space. Ultimately, it comes down to personal preference, but the OP *did* ask for a regex solution.
Alan Moore
+2  A: 

If only ) at the end of the string is to be removed, then this works:

str.replaceFirst("\\)$", "");

This matches exactly what it says: a literal ) (escaped because it's also a regex metacharacter) followed by $, the end-of-string boundary anchor, and replace it with the empty string, effectively deleting any terminating ).

If there is no match, it means that there is no ) at the end of the string (even though there may be occurrences elsewhere), and there is no replacement made and the string is unchanged.


If you generally want to remove the last occurrence of ) which may not be at the end of the string, you can use greedy .* matching:

str.replaceFirst("(.*)\\)", "$1");

Here we have greedy matching .* that captures into \1. If the whole pattern ever matches, \1 would've been as long as it possibly can, which means that the literal ) following it would've had to have been the last occurrence (because if there is another occurrence to its right, \1 could've captured a longer string instead, which is a contradiction).


Performance

Matching the first regex should be optimizable to a O(1) operation, thanks to the end-of-string $ anchor. The actual replacement will be O(N), because the new string would have to be copied to a new buffer if there is a match. If there is no match, then it should be optimizable to return the original string, and therefore would've been O(1) overall. This is as optimal as it gets.

The second regex needs O(N) to match because of the repetition. This is no worse than a linear search for the last ) using lastIndexOf, which is also O(N).

If you're doing this a lot, then you should know the standard compiled Pattern equivalence of replaceFirst. From the API:

An invocation of this method of the form

str.replaceFirst(regex, repl)

yields exactly the same result as the expression

Pattern.compile(regex).matcher(str).replaceFirst(repl)

Readability

"Calling a replaceFirst method that's been hacked to actually replace last is just confusing."

It should be pointed here that in fact, you can use replaceAll with these exact patterns and the solution would still work! Really you just need a regex replace, and either of replaceAll or replaceFirst it really doesn't matter, the pattern is really that simple!

The needle$ to match at the end of the string and the greedy (.*)needle to match the last occurrence are basic idioms that is very readable and understandable to those who have basic understanding of regex. Neither would really qualify as "hacks".

Using a method called replaceFirst to replace the last occurrence of something may seem misleading at first, but this is shortsighted: it is the first match of the pattern that is replaced; what that pattern matches can be anything, be it the sixth "Sense", or the last "Mohican"!

As an analogy, let's take another simple string manipulation example: delete all "spam" substring from a string. I would argue that the most readable solution is to use replace

str.replace("spam", "");

"But wait! The name replace is misleading! You're not replacing it with something else! You should call a method called delete or something!"

That's silly-talk, of course! You are indeed replacing it with something else -- the empty string! Its effect is deletion, but the operation is still string replace-ment!

Just like the replaceFirst in my solution: you may want to replace the last occurrence of something, but it's still a first match of the overall pattern!

Now it's true that a regex pattern out of nowhere will be confusing, but it can be clear from context, e.g:

public static String removeLastCloseParenthesis(String str) {
   return str.replaceFirst("(.*)\\)", "$1");
}

And you can always just name the thing. And you can always put comments as/if necessary. These are just general code readability techniques, and therefore applicable to regex just as they do to everything else.

polygenelubricants
`replaceAll` isn't much better than `replaceFirst`, in both instances it's misleading. No matter how simple the regex is (and I agree, it doesn't get much simpler), it's just not the correct tool for this problem.
Ben S
`replaceAll` and `replaceFirst` are the _only_ options available from `String` class for regex replacement. I don't think they're misleading: it replaces all/first match of a _pattern_. What that pattern matches can be anything (e.g. "third `X`").
polygenelubricants