views:

117

answers:

5

I have a variable v that possibly appears more than one time consecutively in a string. I want to make it so that all consecutive *v*s turn into just one v. For example:

String s = "Hello, world!";
String v = "l";

The regex would turn "Hello, world!" into "Helo, world!"

So I want to do something like

s = s.replaceAll(vv+, v)

But obviously that won't work. Thoughts?

+2  A: 

You need to concatenate the two "v" Strings.

Try s = s.replaceAll(v + v + "+", v)

Noel M
Oh, duh. Thanks.
Earl Bellinger
This will only work for characters that aren't special characters in a regex context.
Jason S
+2  A: 
s = s.replaceAll (v + "+", v)
Roman
Why downvote? Be kind to explain.
Roman
Good question - to me it fulfills the requirement...
Andreas_D
I didn't downvote but this will only work for characters that aren't special characters in a regex context.
Jason S
+4  A: 

Use x{2,} to match x at least twice.

To be able to replace characters with special meanings for regexps, you'd use Pattern.quote:

String part = Pattern.quote(v);
s = s.replaceAll(part + "{2,}", v);

To replace things longer than one character, use non-capturing groups:

String part = "(?:" + Pattern.quote(v) + ")";
s = s.replaceAll(part + "{2,}", v);
gustafc
+1; I incorporated the need for grouping into my answer as well.
polygenelubricants
+8  A: 

Let's iteratively develop the solution; in each step we point out what the problems are and fix it until we arrive at the final answer.

We can start with something like this:

String s = "What???? Impo$$ible!!!";
String v = "!";

s = s.replaceAll(v + "{2,}", v);
System.out.println(s);
// "What???? Impo$$ible!"

{2,} is the regex syntax for finite repetition, meaning "at least 2 of" in this case.

It just so happen that the above works because ! is not a regex metacharacter. Let's see what happens if we try the following:

String v = "?";

s = s.replaceAll(v + "{2,}", v);
// Exception in thread "main" java.util.regex.PatternSyntaxException:       
// Dangling meta character '?'

One way to fix the problem is to use Pattern.quote so that v is taken literally:

s = s.replaceAll(Pattern.quote(v) + "{2,}", v);
System.out.println(s);
// "What? Impo$$ible!!!"

It turns out that this isn't the only thing we need to worry about: in replacement strings, \ and $ are also special metacharacters. That explains why we get the following problem:

String v = "$";
s = s.replaceAll(Pattern.quote(v) + "{2,}", v);
// Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
// String index out of range: 1

Since we want v to be taken literally as a replacement string, we use Matcher.quoteReplacement as follows:

s = s.replaceAll(Pattern.quote(v) + "{2,}", Matcher.quoteReplacement(v));
System.out.println(s);
// "What???? Impo$ible!!!"

Lastly, repetition has higher precedence than concatenation. This means the following:

System.out.println(  "hahaha".matches("ha{3}")    ); // false
System.out.println(  "haaa".matches("ha{3}")      ); // true
System.out.println(  "hahaha".matches("(ha){3}")  ); // true

So if v can contain multiple characters, you'd want to group it before applying the repetition. You can use a non-capturing group in this case, since you don't need to create a backreference.

String s = "well, well, well, look who's here...";
String v = "well, ";
s = s.replaceAll("(?:" +Pattern.quote(v)+ "){2,}", Matcher.quoteReplacement(v));
System.out.println(s);
// "well, look who's here..."

Summary

  • To match an arbitrary literal string that may contain regex metacharacters, use Pattern.quote
  • To replace with an arbitrary literal string that may contain replacement metacharacters, use Matcher.quoteReplacement

References


Bonus material

The following example uses reluctant repetition, capturing group and backreferences mixed with case-insensitive matching:

    System.out.println(
        "omgomgOMGOMG???? Yes we can! YES WE CAN! GOAAALLLL!!!!"
            .replaceAll("(?i)(.+?)\\1+", "$1")
    );
    // "omg? Yes we can! GOAL!"

Related questions

References

polygenelubricants
This is a way better solution, even down to the `"{2,}"` being better regex form than concatenating. Both aren't functionally necessary since just a `Pattern.quote(v) + "+"` would work (a single match being replaced with itself results in no change).
Mark Peters
don't you need to add noncapturing parentheses e.g. "(?:"+Pattern.quote(v)+"){2,}" for multiple characters in the string? (as per my answer and gustafc's)
Jason S
+1 for `quoteReplacement`
gustafc
+3  A: 

With regex's in Java make sure to use Pattern.quote and Matcher.quoteReplacement:

package com.example.test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex2 {
    static public void main(String[] args)
    {
        String s = "Hello, world!";
        String v = "l";

        System.out.println(doit(s,v));

        s = "Test: ??r??r Solo ??r Frankenstein!";
        v = "??r";

        System.out.println(doit(s,v));

    }

    private static String doit(String s, String v) 
    {
        Pattern p = Pattern.compile("(?:"+Pattern.quote(v)+"){2,}");

        Matcher m = p.matcher(s);
        StringBuffer sb = new StringBuffer();
        while (m.find())
        {
            m.appendReplacement(sb, Matcher.quoteReplacement(v));
        }
        m.appendTail(sb);
        return sb.toString();
    }
}
Jason S
Can't wait until `Matcher` takes any `Appendable` instead of `StringBuffer`... It's an RFE somewhere in the bugdb...
polygenelubricants
agreed. (would much rather use StringBuilder)
Jason S