views:

491

answers:

6

Hi guys, I am noticing strange behaviour when using the split() method in Java.

I have a string as follows: 0|1|2|3|4|5|6|7|8|9|10

String currentString[] = br.readLine().split("\\|");
System.out.println("Length:"+currentString.length);
for(int i=0;i < currentString.length;i++){
     System.out.println(currentString[i]);
}

This will produce the desired results:

Length: 11
0
1
2
3
4
5
6
7
8
9
10

However if I receive the string: 0|1|2|3|4|5|6|7|8||

I get the following results:

Length: 8
0
1
2
3
4
5
6
7
8

The final 2 empties are omitted. I need the empties to be kept. Not sure what i am doing wrong. I have also tried using the split in this manner as well. ...split("\\|",-1);

but that returns the entire string with a length of 1.

Any help would be greatly appreciated!

A: 

You need to use indexOf() and then substring() for this to work. I don't think you can empty string by using split() only.

fastcodejava
hmm not sure what you mean, can you give me an example?
astro
A: 

IMO, I think this is the default behavior of split, Anyway please try this:

String currentString[] = br.readLine().replace("||","| |").split("\|"); System.out.println("Length:"+currentString.length); for(int i=0;i < currentString.length;i++){ System.out.println(currentString[i]); }

This has not been tested yet, but i think this should do the trick.

jerjer
this took me a step further in that it added to the length by one index but missed the last index. This is due to the fact that the final index ends as such |(value) not || so the empty space is never added.Should I be using the stringTokenizer method? I know it's deprecated but I think that would be more effective as i'm not sure why this isn't functioning as expected.
astro
+1  A: 

My Java is a little bit rusty, but shouldn't it be:

String currentString[] = "0|1|2|3|4|5|6|7|8||".split("\\|");
System.out.println("Length:"+currentString.length); 
for(int i = 0; i < currentString.length; i++)
{
  System.out.println(currentString[i]); 
} 
Paulo Santos
Note the double backslash. This is because the argument of split() is interpreted as a regular expression. You want to match the character `|` which is done with the regular expression `\|` which is represented by the String literal `"\\|"`
MatrixFrog
My code has the double slash, i'm not sure why it doesn't translate well in the forum.
astro
Oh okay. You should be able to edit your question to reflect that.
MatrixFrog
@astro - it is because you used <code>...</code> rather than the StackOverflow wiki syntax.
Stephen C
+2  A: 

The default behavior of split is to not return empty tokens (because of a zero limit). Use the two parameter split method with a limit of -1 will give you all empty tokens in the return.

UPDATE:

Test code as follows:

public class Test {
    public static void main(String[] args) {
    String currentString[] = "0|1|2|3|4|5|6|7|8||".split("\\|", -1);
    System.out.println("Length:"+currentString.length); 
    for(int i=0;i < currentString.length;i++){ System.out.println(currentString[i]); }
  }
}

Output as follows:

Length:11
0
1
2
3
4
5
6
7
8
--- BLANK LINE --    
--- BLANK LINE --

The "--- BLANK LINE --" is put in by me to show that the return is blank. It is blank once for the empty token after 8| and once for the empty trailing token after the last |.

Hope this clears things up.

Gennadiy
I have tried using the -1 as mentioned in my post but that returns the entire string with a length of 1. As follows:0|1|2|3|4|5|6|7|8||Not sure why?
astro
@astro - please see my edit
Gennadiy
Thank you Gennadiy, not sure what I was doing wrong but this worked.
astro
Actually, by default only *trailing* empties are removed.
Kevin Bourrillion
A: 
Steve Zhang
+2  A: 

String.split() is weird.

Its extreme weirdness, in this and other ways, are some of the reasons why we made Splitter.

It has less surprising behavior and lots of flexibility.

Kevin Bourrillion
I wouldn't call it weird that empty strings between delimiters are removed. The weirdness is that it ONLY does this with empty strings at the end, not in the middle!
Kees Kist
My point exactly.
Kevin Bourrillion