The +
is a regex metacharacter of "one-or-more" repetition, so the pattern -+
is "one or more dash". This would allow you to use str.split("-+")
instead, but you may get an empty string as first element.
If you just want to remove all -
, then you can do str = str.replace("-", "")
. This uses replace(CharSequence, CharSequence)
method, which performs literal String
replacement, i.e. not regex patterns.
If you want a String[]
with each digit in its own element, then it's easiest to do in two steps: first remove all non-digits, then use zero-length assertion to split everywhere that's not the beginning of the string (?!^)
(to prevent getting an empty string as a first element). If you want a char[]
, then you can just call String.toCharArray()
Lastly, if the string can be very long, it's better to use a java.util.regex.Matcher
in a find()
loop looking for a digit \d
, or a java.util.Scanner
with a delimiter \D*
, i.e. a sequence (possibly empty) of non-digits. This will not give you an array, but you can use the loop to populate a List
(see Effective Java 2nd Edition, Item 25: Prefer lists to arrays).
References
Snippets
Here are some examples to illustrate the above ideas:
System.out.println(java.util.Arrays.toString(
"---4--5-67--8-9---".split("-+")
));
// [, 4, 5, 67, 8, 9]
// note the empty string as first element
System.out.println(
"---4--5-67--8-9---".replace("-", "")
);
// 456789
System.out.println(java.util.Arrays.toString(
"abcdefg".toCharArray()
));
// [a, b, c, d, e, f, g]
The next example first deletes all non-digit \D
, then splitting everywhere except the beginning of the string (?!^)
, to get a String[]
each containing a digit:
System.out.println(java.util.Arrays.toString(
"@*#^$4@!#5ajs67>?<{8_(9SKJDH"
.replaceAll("\\D", "")
.split("(?!^)")
));
// [4, 5, 6, 7, 8, 9]
This uses a Scanner
, with \D*
as delimiter, to get each digit as its own token, using it to populate a List<String>
:
List<String> digits = new ArrayList<String>();
String text = "(&*!@#123ask45{P:L6";
Scanner sc = new Scanner(text).useDelimiter("\\D*");
while (sc.hasNext()) {
digits.add(sc.next());
}
System.out.println(digits);
// [1, 2, 3, 4, 5, 6]
Common problems with split()
Here are some common beginner problems when dealing with String.split
:
Lesson #1: split
takes a regular expression pattern
This is probably the most common beginner mistake:
System.out.println(java.util.Arrays.toString(
"one|two|three".split("|")
));
// [, o, n, e, |, t, w, o, |, t, h, r, e, e]
System.out.println(java.util.Arrays.toString(
"not.like.this".split(".")
));
// []
The problem here is that |
and .
are regex metacharacters, and since they are intended to be matched literally, they need to be escaped by preceding with a backslash, which as a Java string literal is "\\"
.
System.out.println(java.util.Arrays.toString(
"one|two|three".split("\\|")
));
// [one, two, three]
System.out.println(java.util.Arrays.toString(
"not.like.this".split("\\.")
));
// [not, like, this]
Lesson #2: split
discards trailing empty strings by default
Sometimes it's desired to keep trailing empty strings (which are discarded by default split
):
System.out.println(java.util.Arrays.toString(
"a;b;;d;;;g;;".split(";")
));
// [a, b, , d, , , g]
Note that there are slots for the "missing" values for c
, e
, f
, but not for h
and i
. To fix this, you can use a negative limit
argument to String.split(String regex, int limit)
.
System.out.println(java.util.Arrays.toString(
"a;b;;d;;;g;;".split(";", -1)
));
// [a, b, , d, , , g, , ]
You can also use a positive limit
of n to apply the pattern at most n - 1 times (i.e. resulting in no more than n elements in the array).
Zero-width matching split
examples
Here are more examples of splitting on zero-width matching constructs; this can be used to split a string but also keep "delimiters".
Simple sentence splitting, keeping punctuation marks:
String str = "Really?Wow!This.Is.Awesome!";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[.!?])")
)); // prints "[Really?, Wow!, This., Is., Awesome!]"
Splitting a long string into fixed-length parts, using \G
String str = "012345678901234567890";
System.out.println(java.util.Arrays.toString(
str.split("(?<=\\G.{4})")
)); // prints "[0123, 4567, 8901, 2345, 6789, 0]"
Split before capital letters (except the first!)
System.out.println(java.util.Arrays.toString(
"OhMyGod".split("(?=(?!^)[A-Z])")
)); // prints "[Oh, My, God]"
A variety of examples is provided in related questions below.
References
Related questions