views:

397

answers:

5

Given the following string:

"foo bar-baz-zzz"

I want to split it at the characters " " and "-", preserving their value, but get all combinations of inputs.

i want to get a two-dimensional array containing

{{"foo", "bar", "baz", "zzz"}
,{"foo bar", "baz", "zzz"}
,{"foo", "bar-baz", "zzz"}
,{"foo bar-baz", "zzz"}
,{"foo", "bar", "baz-zzz"}
,{"foo bar", "baz-zzz"}
,{"foo", "bar-baz-zzz"}
,{"foo bar-baz-zzz"}}

Is there any built-in method in Java to split the string this way? Maybe in a library like Apache Commons? Or do I have to write a wall of for-loops?

A: 

There is no library method.

To accomplish that, you should tokenize the string (in your case using " -") by preserving the separators, and then you should think of separators as associated to binary flags and build all combination based on the value of the flags.

In your case, you have 3 separators: " ", "-" and "-", so you have 3 binary flags. You will end up with 2^3 = 8 values in the string.

Marian
+6  A: 

Here is a recursive solution that works. I used a List<List<String>> rather than a 2-dimensional array to make things easier. The code is a bit ugly and could probably be tidied up a little.

Sample output:

$ java Main foo bar-baz-zzz
Processing: foo bar-baz-zzz
[foo, bar, baz, zzz]
[foo, bar, baz-zzz]
[foo, bar-baz, zzz]
[foo, bar-baz-zzz]
[foo bar, baz, zzz]
[foo bar, baz-zzz]
[foo bar-baz, zzz]
[foo bar-baz-zzz]

Code:

import java.util.*;

public class Main {
  public static void main(String[] args) {
    // First build a single string from the command line args.
    StringBuilder sb = new StringBuilder();
    Iterator<String> it = Arrays.asList(args).iterator();
    while (it.hasNext()) {
      sb.append(it.next());

      if (it.hasNext()) {
        sb.append(' ');
      }
    }

    process(sb.toString());
  }

  protected static void process(String str) {
    System.err.println("Processing: " + str);
    List<List<String>> results = new LinkedList<List<String>>();

    // Invoke the recursive method that does the magic.
    process(str, 0, results, new LinkedList<String>(), new StringBuilder());

    for (List<String> result : results) {
      System.err.println(result);
    }
  }

  protected static void process(String str, int pos, List<List<String>> resultsSoFar, List<String> currentResult, StringBuilder sb) {
    if (pos == str.length()) {
      // Base case: Reached end of string so add buffer contents to current result
      // and add current result to resultsSoFar.
      currentResult.add(sb.toString());
      resultsSoFar.add(currentResult);
    } else {
      // Step case: Inspect character at pos and then make recursive call.
      char c = str.charAt(pos);

      if (c == ' ' || c == '-') {
        // When we encounter a ' ' or '-' we recurse twice; once where we treat
        // the character as a delimiter and once where we treat it as a 'normal'
        // character.
        List<String> copy = new LinkedList<String>(currentResult);
        copy.add(sb.toString());
        process(str, pos + 1, resultsSoFar, copy, new StringBuilder());

        sb.append(c);
        process(str, pos + 1, resultsSoFar, currentResult, sb);
      } else {
        sb.append(c);
        process(str, pos + 1, resultsSoFar, currentResult, sb);
      }
    }
  }
}
Adamski
This is the best answer, just splitting using " -" will not work.
yx
you can avoid some nasty corner cases by changing the first lines to:if (pos == str.length()) { if (sb.length() > 0) { currentResult.add(sb.toString()); resultsSoFar.add(currentResult); }
Andreas Petersson
@Andreas: Is this a corner case? If the String ends with a delimiter I wasn't sure whether the result should include the empty string as a possible token or not.
Adamski
A: 

Why do you need that?

Notice that for a given string of N tokens you want to get an array of ca N*2^N strings. This (can) consume tons of memory if not done in a safe way...

I guess that probably you will need to iterate trough it all, right? If so than its better to create some class that will keep the original string and just give you different ways of splitting a row each time you ask it. This way you will save tons of memory and get better scalability.

ppow
+3  A: 

Here is a class that will lazily return lists of split values:

public class Split implements Iterator<List<String>> {
  private Split kid;                 private final Pattern pattern;
  private String subsequence;        private final Matcher matcher;
  private boolean done = false;      private final String sequence;
  public Split(Pattern pattern, String sequence) {
    this.pattern = pattern;          matcher = pattern.matcher(sequence);
    this.sequence = sequence;
  }

  @Override public List<String> next() {
    if (done) { throw new IllegalStateException(); }
    while (true) {
      if (kid == null) {
        if (matcher.find()) {
          subsequence = sequence.substring(matcher.end());
          kid = new Split(pattern, sequence.substring(0, matcher.start()));
        } else { break; }
      } else {
        if (kid.hasNext()) {
          List<String> next = kid.next();
          next.add(subsequence);
          return next;
        } else { kid = null; }
      }
    }
    done = true;
    List<String> list = new ArrayList<String>();
    list.add(sequence);
    return list;
  }
  @Override public boolean hasNext() { return !done; }
  @Override public void remove() { throw new UnsupportedOperationException(); }
}

(Forgive the code formatting - it is to avoid nested scrollbars).

For the sample invocation:

Pattern pattern = Pattern.compile(" |-");
String str = "foo bar-baz-zzz";
Split split = new Split(pattern, str);
while (split.hasNext()) {
  System.out.println(split.next());
}

...it will emit:

[foo, bar-baz-zzz]
[foo, bar, baz-zzz]
[foo bar, baz-zzz]
[foo, bar-baz, zzz]
[foo, bar, baz, zzz]
[foo bar, baz, zzz]
[foo bar-baz, zzz]
[foo bar-baz-zzz]

I imagine the implementation could be improved upon.

McDowell
+4  A: 

Here's a much shorter version, written in a recursive style. I apologize for only being able to write it in Python. I like how concise it is; surely someone here will be able to make a Java version.

def rec(h,t):
  if len(t)<2: return [[h+t]]
  if (t[0]!=' ' and t[0]!='-'): return rec(h+t[0], t[1:])
  return rec(h+t[0], t[1:]) + [ [h]+x for x in rec('',t[1:])]

and the result:

>>> rec('',"foo bar-baz-zzz")
[['foo bar-baz-zzz'], ['foo bar-baz', 'zzz'], ['foo bar', 'baz-zzz'], ['foo bar'
, 'baz', 'zzz'], ['foo', 'bar-baz-zzz'], ['foo', 'bar-baz', 'zzz'], ['foo', 'bar
', 'baz-zzz'], ['foo', 'bar', 'baz', 'zzz']]
redtuna
it's interesting to see how much shorter the python code is from its Java counterparts
yairchu