i have the following string
Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)
i want to capture
212,323.222
2-2.24
0.5
i.e. i want the above three results from the string,
can any one help me with this regex
i have the following string
Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)
i want to capture
212,323.222
2-2.24
0.5
i.e. i want the above three results from the string,
can any one help me with this regex
Okay I didn't notice the C# tag until now. I will leave the answer but I know that's not what you expected, see if you can do something with it. Perhaps the title should have mentioned the programming language?
Sure:
Fat mass loss was (.*) greater for GPLC \((.*) vs. (.*)kg\)
Find your substrings in \1, \2 and \3. If for Emacs, swap all parentheses and escaped parentheses.
How about something like this:
^.*((?:\d+,)*\d+(?:\.\d+)?).*(\d+(?:\.\d+)?(?:-\d+(?:\.\d+))?).*(\d+(?:\.\d+)).*$
A little more general, I think. I'm a little concerned about .* being greedy.
Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)
a generalized extractor:
/\D+?([\d\,\.\-]+)/g
explanation:
/ # start pattern
\D+ # 1 or more non-digits
( # capture group 1
[\d,.-]+ # character class, 1 or more of digits, comma, period, hyphen
) # end capture group 1
/g # trailing regex g modifier (make regex continue after last match)
sorry I don't know c# well enough for a full writeup, but the pattern should plug right in.
see: http://www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx for some implementation examples.
It looks like you're trying to find all numbers in the string (possibly with commas inside the number), and all ranges of numbers such as "2-2.4". Here is a regex that should work:
\d+(?:[,.-]\d+)*
From C# 3, you can use it like this:
var input = "Fat mass loss was 2121,323.222 greater for GPLC (2-2.4kg vs. 0.5kg)";
var pattern = @"\d+(?:[,.-]\d+)*";
var matches = Regex.Matches(input, pattern);
foreach ( var match in matches )
Console.WriteLine(match.Value);
I came out with something like this atrocity:
-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?(?:[–-]-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?)?
Out of witch -?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?
is repeated twice, with –
in the middle (note that this is a long hyphen).
This should take care of dots and commas outside of numbers, eg: hello,23,45.2-7world
- will capture 23,45.2-7
.
I noticed that your hyphen in 2–2.4kg is not really hyphen, its a unicode 0x2013 "DASH".
So, here is another regex in C#
@"[0-9]+([,.\u2013-][0-9]+)*"
Test
MatchCollection matches = Regex.Matches("Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)", @"[0-9]+([,.\u2013-][0-9]+)*");
foreach (Match m in matches) {
Console.WriteLine(m.Groups[0]);
}
Here is the results, my console does not support printing unicode char 2013, so its "?" but its properly matched.
2121,323.222
2?2.4
0.5
Hmm, this is a tricky question, especially because the input string contains unicode character – (EN DASH) instead of - (HYPHEN-MINUS). Therefore the correct regex to match the numbers in the original string would be:
\d+(?:[\u2013,.]\d+)*
If you want a more generic approach would be:
\d+(?:[\p{Pd}\p{Pc}\p{Po}]\d+)*
which matches dash punctuation, connecter punctuation and other punctuation. See here for more information about those.
An implementation in C# would look like this:
string input = "Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)";
try {
Regex rx = new Regex(@"\d+(?:[\p{Pd}\p{Pc}\p{Po}\p{C}]\d+)*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
Match match = rx.Match(input);
while (match.Success) {
// matched text: match.Value
// match start: match.Index
// match length: match.Length
match = match.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Thanks everybody for your co-operation actually i tried a bit more after having a cup of tea and finally got the solution to my problem :)
Following is the Regex that gave my desired result
(([0-9]+)([–.,-]*))+
Thanks a lot everyone who helped me in solving my problem
Regards, Muhammad Waqas
Let's try this one :
(?=\d)([0-9,.-]+)(?<=\d)
It captures all expressions containing only :
It works with a single digit expression and does not include beginning or trailing [.,-].
Hope this helps.