views:

784

answers:

2

I have to port some C# code to Java and I am having some trouble converting a string splitting command.

While the actual regex is still correct, when splitting in C# the regex tokens are part of the resulting string[], but in Java the regex tokens are removed.

What is the easiest way to keep the split-on tokens?

Here is an example of C# code that works the way I want it:

using System;

using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        String[] values = Regex.Split("5+10", @"([\+\-\*\(\)\^\\/])");

        foreach (String value in values)
            Console.WriteLine(value);
    }
}

Produces:
5
+
10
+1  A: 

This is because you are capturing the split token. C# takes this as a hint that you wish to retain the token itself as a member of the resulting array. Java does not support this.

Andrew Hare
Try the following:using System;using System.Text.RegularExpressions;class Program{ static void Main() { String[] values = Regex.Split("5+10", @"([\+\-\*\(\)\^\\/])"); foreach (String value in values) Console.WriteLine(value); }}
+1  A: 

I don't know how C# does it, but to accomplish it in Java, you'll have to approximate it. Look at how this code does it:

public String[] split(String text) {
    if (text == null) {
        text = "";
    }

    int last_match = 0;
    LinkedList<String> splitted = new LinkedList<String>();

    Matcher m = this.pattern.matcher(text);

    // Iterate trough each match
    while (m.find()) {
        // Text since last match
        splitted.add(text.substring(last_match,m.start()));

        // The delimiter itself
        if (this.keep_delimiters) {
            splitted.add(m.group());
        }

        last_match = m.end();
    }
    // Trailing text
    splitted.add(text.substring(last_match));

    return splitted.toArray(new String[splitted.size()]);
}
Pesto