tags:

views:

57

answers:

2

This sort of example for extracting integers from a string is common:

string input = "10 blah 20 30 nonsense 40 50";

string[] numbers = System.Text.RegularExpressions.Regex.Split(input, @"^[\d]");

But how can I account for numbers with a decimal point?

e.g.

string input = "10 blah 20 30 nonsense 40.5 50"

used with the above regular expression, unsurprisingly, the 40 and the 5 after the decimal point are split into different elements of the numbers array.

In my naivity, I thought the below would work:

string[] numbers = System.Text.RegularExpressions.Regex.Split(input, @"^[\d\.]");

But this causes the decimal point to be split into its own element of the array.

This seems like it should be so easy but I've tried all sorts of regular expressions without any success. I'm tearing my hair out - any help is greatly appreciated!

A: 
System.Text.RegularExpressions.Regex.Split(input, @"[^\d.]+");

The split delimiter is 1 or more characters that are neither digits nor periods. Note that your common example is wrong even for the first input. It will just use the first digit as the delimiter, which means it doesn't appear in the array.

EDIT: I tested it with both inputs, and it works:

string input = "10 blah 20 30 nonsense 40 50";
System.Text.RegularExpressions.Regex.Split(input, @"[^\d.]+");
{ "10", "20", "30", "40", "50" }

input = "10 blah 20 30 nonsense 40.5 50";
System.Text.RegularExpressions.Regex.Split(input, @"[^\d.]+");
{ "10", "20", "30", "40.5", "50" }
Matthew Flaschen
Thank you so much!- I copied the example down wrong which may explain my stupid error! I've got a long way to go with regular expression, evidently...
zithery
Err, this didn't actually work, Matthew. I tried @"[^\d\.]+" as well for good measure. This problem still persits!
zithery
@zithery, it works for me with both your inputs. You don't have to escape `.` inside a character class.
Matthew Flaschen
Quite right. I'm being an idiot. What I was seeing was one of the false positives Jaroslav mentioned. Thanks for your help guys. With my puny mind I'll take it forward from here.
zithery
+1  A: 

Bear in mind that with Regex.Split(input, @"[^\d.]+");, you are going to get false positives (for strings like non.sense or 50.6.8) => you will have to filter out the results with the value '.' in them.

You could also use the Matches method instead of Split.

MatchCollection mc = Regex.Matches(input, @"\d+(?:\.\d+)?");<br />
string[] numbers = (from Match m in mc select m.Value).ToArray();
Jaroslav Jandek
Just to be clear, I tried Regex.Split(input, @"[^\d.]+"); and 40.5 was still split into 3 cells! Correction, I only get the period in a cell.
zithery
Try the sample code I have posted.The Regex string should be @"\d+(?:\.\d+)?" - SO is parsing '\.' as '.'.
Jaroslav Jandek
@Jaroslav: Use backticks (`) in comments to enclose verbatim strings. Or escape the backslash by doubling, but the first way looks better.
Tim Pietzcker
@Tim Pietzcker: thank you. I have just realized that when you use the clickable code button, the code is OK. With <pre><code> it parses generics as HTML tags (filters them out) + other artifacts.
Jaroslav Jandek