views:

434

answers:

5

What is the best way to interleave a java String with a given character sequence. The interleave interval should be changeable.

Example:

String s = " .... 0000000000000 ..."; // length random
String b = interleave(s, 3, "-");

Result:

... 000-000-000-000-000 ...

another example:

String s = " .... we all we all we all ...";
String b = interleave(s, 7, "rock ");

Result:

... we all rock we all rock we all rock ...

The function should also work if the string length isn't a multiple of the the interleave distance. Any suggestions? Is there (again) a 'commons' way to do it?

+10  A: 

Here's quite simple and fairly readable implementation (I call it StringBuilder in the benchmark below):

public static String interleave(String s, int interval, String separator)
{
 StringBuilder sb = new StringBuilder(s);
 for (int pos = (s.length()-1) / interval; pos > 0; pos--)
 {
  sb.insert(pos * interval, separator);
 }
 return sb.toString();
}

If you're concerned with the efficiency of the simple StringBuilder implementation then maybe this implementation would suit your needs better (I call it Arrays in the benchmark below):

public static String interleave(String string, int interval, String separator)
{
 char[] src = string.toCharArray();
 char[] sep = separator.toCharArray();
 int count = (src.length-1)/interval;
 char[] dst = new char[src.length + count * sep.length];
 int srcpos = 0, dstpos = 0;
 for (int i = 0; i < count; i++)
 {
  System.arraycopy(src, srcpos, dst, dstpos, interval);
  srcpos += interval;
  dstpos += interval;
  System.arraycopy(sep, 0, dst, dstpos, sep.length);
  dstpos += sep.length;
 }
 if (dstpos < dst.length)
 {
  System.arraycopy(src, srcpos, dst, dstpos, dst.length - dstpos);
 }
 return String.valueOf(dst);
}

Note: I would probably use this kind of implementation only when under J2ME environment but it should be seriously faster on huge strings. The readability is quite poor though...

There of course always is the RegExp way of doing things which is surprisingly quite fast after you climb past the length where compiling the RegExp itself stops being a problem (you can't pre-compile a RegExp, because it's generated on the fly depending on interval, thanks to Rubens Farias for pointing this out, somehow missed it myself). So here it goes (I call this RegExp in the benchmark below):

public static String interleave(String string, int interval, String separator)
{
 return string.replaceAll("(.{"+interval+"})", "$1"+Matcher.quoteReplacement(separator));
}

Note: This implementation inserts separator at the end if the length of string is at a multiple of interval (whereas the other implementations don't). I don't like RegExps because they are un-readable and not too fast either. Oh and you can easily forget the "quoteReplacement" part and put yourself into a big trouble if the separator contains "$1" or even worse - if it comes from the user.

Benchmark

At this point I've done some benchmarking, so the first implementation at string length 100000 takes 0.002643 seconds, the second one - 0.000010, the third one - 0.000071, but everything depends on string length.

Length    StringBuilder   Arrays       RegExp
  10000     0.000012     0.000001     0.000054
 100000     0.002643     0.000010     0.000071
1000000     0.315413     0.000026     0.000199

It's by no means a serious benchmarking, but it still shows the trends and complexities of the algorithms involved.

Note: Although it is fun to play with these ideas, we're still talking about sub-second improvements when working with strings that are less than 1M in size. So it doesn't matter too much which way you go if you're only working with strings that are up to 1K in size (it will be 0ms vs 0ms). The most important thing is that it should be readable, straightforward and not take too much time to write as I'm sure you have more important problems to solve unless you're writing an universal library for everyone to use in the weirdest cases. Remember - your time is much more valuable than CPU time.

Left- and right-aligned interleaving

I'll take the Arrays implementation for that as it seems easiest to change for this:

public static String interleave(String string, int interval, String separator, boolean fromRight)
{
 char[] src = string.toCharArray();
 char[] sep = separator.toCharArray();
 int count = (src.length-1)/interval;
 char[] dst = new char[src.length + count * sep.length];
 int srcpos = 0, dstpos = 0;
 if (fromRight)
 {
  srcpos = dstpos = src.length - count * interval;
  if (srcpos > 0) System.arraycopy(src, 0, dst, 0, srcpos);
  if (count > 0)
  {
   System.arraycopy(sep, 0, dst, dstpos, sep.length);
   dstpos += sep.length;
   count--;
  }
 }
 for (int i = 0; i < count; i++)
 {
  System.arraycopy(src, srcpos, dst, dstpos, interval);
  srcpos += interval;
  dstpos += interval;
  System.arraycopy(sep, 0, dst, dstpos, sep.length);
  dstpos += sep.length;
 }
 if (dstpos < dst.length)
 {
  System.arraycopy(src, srcpos, dst, dstpos, dst.length - dstpos);
 }
 return String.valueOf(dst);
}
inkredibl
I don't think that interleaving an existing StringBuilder with a String "separator" as you called it is an efficient way to solve this problem
Yaneeve
That's why I said "pretty efficient" ;) it still is better than String "arithmetics" and it's readable too. And you only need more efficient code when you really need it (i.e. for strings that are seriously longer than 20 chars).
inkredibl
would it be difficult to change the implementation so that the start can be defined? eg.: start from right or left so that a string '0000' can be interleaved with (3,'-',RIGHT) to 0-000 or (3,'-', LEFT) to 000-0
Chris
"or you could compile a RegExp in first invocation": you can't since you're build regex string based on interval parameter
Rubens Farias
Thanks Rubens, I've edited that part.Chris - I've added the kind of implementation you want at the end.
inkredibl
+1  A: 

I think this solution is very efficient. Doesn't involve array copy or StringBuilder expansion:

public static String interleave(String input, int interval, String sep) {
 StringBuilder sb = new StringBuilder(input.length() + (((input.length() -1) / interval) * sep.length()));
 char[] array = input.toCharArray();
 for (int i = 0; i < array.length; i += interval) {
  int span = i + interval;
  for (int j = i; j < Math.min(span, array.length); j++) {
   sb.append(array[j]);
  }
  if (span < array.length)
   sb.append(sep);
 }
 return sb.toString();
}
bruno conde
+1  A: 

This is C#, but I'm sure Java have a similar approach:

public static string interleave(string input, int interval, string separator)
{
    if (String.IsNullOrEmpty(input)     || 
        String.IsNullOrEmpty(separator) || interval <= 0)
        return input;

    int length = input.Length + // original length + added chars - last occur
        ((input.Length / interval) * separator.Length) -
         (input.Length % interval == 0 ? separator.Length : 0);
    return Regex.Replace(      // magic happens here
        input, String.Format("(.{{{0}}})", interval),
        "$1" + separator.Replace("$", "$$")).Substring(0, length);
}
Rubens Farias
Neat, although it needs to be fixed to be able to handle separators like "$1" :)
gustafc
fair enough, fixed =)
Rubens Farias
also you'd add separator at the end of input if input.length is a multiple of interval...
inkredibl
@inkredibl, fixed too;
Rubens Farias
A: 

This is efficient and clear:

public static String interleave(String s, int interval, String separator) {
        StringBuffer b = new StringBuffer();
        int length = s.length();

        for (int start = 0; start < length - 1; start += interval) {
            int end = Math.min(length, start + interval);
            b.append(s.substring(start, end));
            b.append(separator);
        }

        if (length % interval > 0) {
            b.append(s.substring(length - (length % interval)));
        }

        return b.toString();
    }
Bruno Rothgiesser
A: 

Using the prerelease Google Guava libraries:

Joiner.on("-").join(Splitter.fixedLength(3).split(inputString));

Short, clear, and expressive. Love it!

Cowan