tags:

views:

80

answers:

3

I know, I know, now I have two problems 'n all that, but regex here means I don't have to write two complicated loops. Instead, I have a regex that only I understand, and I'll be employed for yonks.

I have a string, say stack.overflow.questions[0].answer[1].postDate, and I need to get the [0] and the [1], preferably in an array. "Easy!" my neurons exclaimed, just use regex and the split method on your input string; so I came up with this:

String[] tokens = input.split("[^\\[\\d\\]]");

which produced the following:

[, , , , , , , , , , , , , , , , [0], , , , , , , [1]]

Oh dear. So, I thought, "what would replaceAll do in this instance?":

String onlyArrayIndexes = input.replaceAll("[^\\[\\d\\]]", "");

which produced:

[0][1]

Hmm. Why so? I'm looking for a two-element string array that contains "[0]" as the first element and "[1]" as the second. Why does split not work here, when the Javadocs declare they both use the Pattern class as per the Javadoc?

To summarise, I have two questions: why does the split() call produce that large array with seemingly random space characters and am I right in thinking the replaceAll works because the regex replaces all characters not matching "[", a number and "]"? What am I missing that means I expect them to produce similar output (OK that's three, and please don't answer "a clue?" to this one!).

+2  A: 

This is not a direct answer to your question, however I want to show you a great API that will suit your need.

Check out Splitter from Google Guava.

So for your example, you would use it like this:

Iterable<String> tokens = Splitter.onPattern("[^\\[\\d\\]]").omitEmptyStrings().trimResults().split(input);

//Now you get back an Iterable which you can iterate over. Much better than an Array.
for(String s : tokens) {
   System.out.println(s);
}

This prints:
0
1

Shervin
A great suggestion, thanks. Right now I only have use for regex in this particular instance, but I'll go to Guava should I need it further.
atc
Google Guava supports regex. As I have shown in the example.
Shervin
+4  A: 

well from what I can see the split does work, it gives you an array that holds the string split for each match that is not a set of brackets with a digit in the middle.

as for the replaceAll I think your assumption is right. it removes everything (replace the match with "") that is not what you want.

From the API documentation:

Splits this string around matches of the given regular expression.

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

The string "boo:and:foo", for example, yields the following results with these expressions:

Regex     Result
:     { "boo", "and", "foo" }
o     { "b", "", ":and:f" }
posdef
Thank you, it was the fact that split gives me an element in the array for each match of my regex; this is what I was failing to understand!
atc
A: 
T.J. Crowder