views:

149

answers:

4

How to split a string on numbers and substrings?

Input: 34AG34A
Expected output: {"34","AG","34","A"}

I have tried with Regex.Split() function, but I can not figure out what pattern would work.

Any ideas?

+2  A: 

First, you ask for "numbers" but don't specify what you mean by that.

If you mean "digits in 0-9" then you need the character class [0-9]. There is also the character class \d which in addition to 0-9 matches some other characters.

\d matches any decimal digit. It is equivalent to the \p{Nd} regular expression pattern, which includes the standard decimal digits 0-9 as well as the decimal digits of a number of other character sets.

I assume that you are not interested in negative numbers, numbers containing a decimal point, foreign numerals such as 五, etc.

Split is not the right solution here. What you appear to want to do is tokenize the string, not split it. You can do this by using Matches instead of Split:

string[] output = Regex.Matches(s, "[0-9]+|[^0-9]+")
    .Cast<Match>()
    .Select(match => match.Value)
    .ToArray();
Mark Byers
+5  A: 

The regular expression (\d+|[A-Za-z]+) will return the groups you require.

robyaw
Surely you don't mean A-z.
Kobi
Note however that the set `[A-z]` is the same as `[A-Z\\[\\\\]^_\`a-z]`. The set `[A-Za-z]` might be more appropriate.
Guffa
@Guffa - that's something I did not know - I always assumed [A-z] was shorthand for [A-Za-z]. I must admit usually using [A-Za-z] because it seemed clearer to me. Guess this's bit me on the ass this time :-)
robyaw
+3  A: 

I think you have to look for two patterns:

  • a sequence of digits
  • a sequence of letters

Hence, I'd use ([a-z]+)|([0-9]+).

For instance, System.Text.RegularExpressions.Regex.Matches("asdf1234be56qq78", "([a-z]+)|([0-9]+)") returns 6 groups, containing "asdf", "1234", "be", "56", "qq", "78".

naivists
+1  A: 

Don't use Regex.Split, use Regex.Match:

var m = Regex.Match("34AG34A", "([0-9]+|[A-Z]+)");
while (m.Success) {
    Console.WriteLine(m);
    m = m.NextMatch();
}

Converting this to an array is left as an exercise to the reader. :-)

Heinzi