ansaurus

Question

Regex that matches anything before a certain character?

Answer 1

A:

/(\d+)\.\d/g

This will match any number that has a decimal following it (which I think is what you want), but will only capture the numbers before the decimal. \d will only capture numbers (same as [0-9]), so it makes this pretty simple.

Edit: If you want the three and the eight as well, you don't even need to check for the decimal.

Edit2: Sorry, fixed it so it will ignore all the decimal places.

/(\d+)(?:\.\d+)?/g

tj111 2009-06-03 16:44:03

Please see my edit - I need to get all numbers, but strip out the numbers after the decimal point (my actual data has crazy precision)

Jeff Meatball Yang 2009-06-03 16:46:00

If I use your second one, I get the 9 and the 1, which I don't want.

Jeff Meatball Yang 2009-06-03 16:47:37

Answer 2

+3 A:

/[^.](\d+)[^.]/

As stated below just use MatchObj.Groups(1) to get the digit.

2009-06-03 16:49:17

Won't that also grab the digits following the decimal point? Might want to put a [^.] at the front of that.

Michael Myers 2009-06-03 16:58:59

Answer 3

+1 A:

Try:

[0-9]*(?=[3])

It uses a lookahead to match only numbers followed by a decimal point.

C# Code:

Regex regex = new Regex("[0-9]+(?=[.])");
MatchCollection matches = regex.Matches(input);

Stephan 2009-06-03 16:50:52

You will get a blank entry at every period, because you match 0 or more digits instead of 1 or more.

Michael Myers 2009-06-03 17:01:40

Thanks, was in a rush earlier and wasn't really paying attention

Stephan 2009-06-03 18:04:52

Answer 4

+2 A:

If you don't want to deal with groups, you can use a lookahead like you say; this pattern finds the integer part of all decimal numbers in the string:

Regex integers = new Regex(@"\d+(?=\.\d)");
MatchCollection matches = integers.Matches(str);

matches will contain 81 and 88. If you'd like to match the integer part of ANY numbers (decimal or not), you can instead search for integers that don't start with a .:

Regex integers = new Regex(@"(?<!\.)\d+");

This time, matches would contain 81, 3, 8 and 88.

ojrac 2009-06-03 16:54:57

In your first regex, you ought to put `\d+` before the final closing paren so that you don't get false positives at the ends of sentences.

Ben Blank 2009-06-03 17:18:15

Excellent point. I went with `\d` since I don't care how many there are. Thanks for the correction.

ojrac 2009-06-03 18:04:59

In your second code block, what kind of syntax is that? I don't know what ?<! means. Thanks.

Jeff Meatball Yang 2009-06-04 06:06:52

(?<!pattern) is a negative lookbehind -- so, it prevents any matches that follow the pattern `\.`

ojrac 2009-06-04 21:32:23

Link for more in-depth info: http://www.regular-expressions.info/lookaround.html#lookbehind

ojrac 2009-06-04 21:35:01

I was able to use these code snips as starting points - thanks. It turns out that a lookahead for OR'ed patterns is what I was looking for.

Jeff Meatball Yang 2009-06-05 06:18:04

Answer 5

A:

Try using /(\d+)((\.\d+)?)/

This basically means match a sequence of digits and an optional decimal point with another sequence of digits. Then, use MatchObj.Groups(1) for the first match value, ignoring the second one.

Yuval F 2009-06-03 16:55:35

Answer 6

+1 A:

[^.](\d+)

From your example, this will match " 81", " 3", " 8", " 88"

You'll get an extra character before you get your number, but you can just trim that out in your code.

jimyi 2009-06-03 16:58:49

Answer 7

A:

This is not in the language you asked about, but it may help you think about the problem.

$ echo "A total of 81.8 percent of New York City students in grades 3 to 8 are meeting or exceeding grade-level math standards, compared to 88.9 percent of students in the rest of the State." \
| fmt -w 1 | sed -n -e '/^[0-9]/p' | sed -e 's,[^0-9].*,,' | fmt -w 72
81 3 8 88

The first fmt command asks the following commands to consider each word separately. The "sed -n" command outputs only those words which start with at least one number. The second sed command removes the first non-digit character in the word, and everything after. The second fmt command combines everything back into one line.

$ echo "This tests notation like 6.022e+23 and 10e100 and 1e+100." \
| fmt -w 1 | sed -n -e '/^[0-9]/p' | sed -e 's,[^0-9].*,,' | fmt -w 72
6 10 1

Jason Catena 2009-06-03 20:03:58

Answer 8

+2 A:

Complete C# solution:

/// <summary>
/// Use of named backrefence 'roundedDigit' and word boundary '\b' for ease of
/// understanding
/// Adds the rounded percents to the roundedPercents list
/// Will work for any percent value
/// Will work for any number of percent values in the string
/// Will also give those numbers that are not in percentage (decimal) format
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetRoundedPercents(string digitSequence, out List<string> roundedPercents)
{
    roundedPercents = null;
    string pattern = @"(?<roundedDigit>\b\d{1,3})(\.\d{1,2}){0,1}\b";

    if (Regex.IsMatch(digitSequence, pattern))
    {
        roundedPercents = new List<string>();
        Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.ExplicitCapture);

        for (Match m = r.Match(digitSequence); m.Success; m = m.NextMatch())
            roundedPercents.Add(m.Groups["roundedDigit"].Value);

        return true;
    }
    else
        return false;
}

From your example returns 81, 3, 8 and 88

Rashmi Pandit 2009-06-04 08:08:46

ansaurus

tags:

views:

answers:

Regex that matches anything before a certain character?

related questions