views:

69

answers:

4

What will be the best way using javascript regular expression to get numbers out of text.. e.g.... I have "$4,320 text/followme" and I want to get 4320 out of this. However I want to avoid numbers after first occourance of an alphabet or any non alphabet other than a comma ','

so that if i have $4,320 t234ext/followme it will still return me 4320. The input will always have $ sign at the beginning

so the regular expression should return

 $4,320 text/followme          returns  4320
 $4,320 t3444ext/followme      return   4320
 $4,320 /followme              return   4320
 $4320 text/followme           return   4320
 $4320 t3444ext/followme       return   4320
 $4320 /follow4me              return   4320
+5  A: 
string.split(/ /)[0].replace(/[^\d]/g, '')
SilentGhost
+1 nice and short
jigfox
@SilentGhost - Can you get me a bit more info on the above regular expression... I cannot explain how easily I get confuse in regular expressions...
zoom_pat277
I would recommend beginning with `(string || "0 0")`.
ChaosPandion
@zoom: it splits on a space, takes the first element of the resulting array and removes all non-digits from it. `string` is your subject string.
SilentGhost
A: 
function parseNumber(input) {
    var r = "", i = 0, c = "", s = input + " ";
    if (s.charAt(0) === "$") {
        i++;
    } 
    while (i < s.length) {        
        c = s.charAt(i++);
        if (c < "0" || c > "9") {
            if (c === ",") {
                continue;
            }
            break;
        }
        r += c;
    }
    return r;
}
ChaosPandion
A: 

The simplest regular expression you're possibly looking for is \D (any character that's not a numeral. There's a few of these "negated" expressions -- \d matches a numeral, \D matches non-numerals. \w matches "word" characters (alphanumeric plus the underscore), \W matches non-numeric. \s matches whitespace, \S matches non-whitespace characters).

So:

str = '$4,320 text/folowme';
number = str.replace(/\D/g,'');

should yield '4320' inside of number. The 'g' is important. It says do a global search/replace for all instances of that regex. Without it, you'll just lose the dollar sign. :)

Note that if you've got negative numbers or rationals (which can have two non-numeric characters in their representation, '-' and '.'), your problem gets a little bit harder. You could do something like:

number = str.replace(/[^-.0-9]/g,'');

Which will work as long your numbers are well formed -- as nobody does anything crazy like '4-5.0-9aaaa4z.2'.

To be safe, you could run that list bit through parseInt or parseFloat:

number = parseFloat(str.replace(/[^-.0-9]/g,''));

UPDATE

I spaced the requirement to avoid including subsequent numbers. If whitespace reliably delimits the end of the number you want, as it does in the examples, you could add a space or \s to the negated character class on that last example I gave, so it'd be something like this:

number = parseFloat(str.replace(/[^-.0-9\s]/g,''));

and it'll strip out the extra numbers just fine.

UPDATE 2

After thinking about this for a bit, using parseFloat means that you don't have to strip out everything -- just all the non-numeric characters before the number you want, and commas. So we can break this into two simpler regexes (and probably faster, especially since one of them is non-global). And then parseFloat will discard trailing non-numeric input for you.

number = parseFloat(str.replace(/,/g,'').replace(/^[^-0-9]*/,''));
Weston C
@Weston C - I will test this in a moment but i think this would even give me numbers which occur in after the the first alphabet... so $4,320 t778ext/folowme would give me 432077 ?
zoom_pat277
That's true. If whitespace marks the end of the number, then you could change that last regular expression to [^-.0-9 ] or even [^-.0-9\s] if you want to be more thorough. Run that through parseFloat or parseInt and it'll stop at the first space and you won't get the subsequent numbers.
Weston C
A: 

Here's a slightly more complicated regex.

2nd line: It checks for the initial '$', and allows any combination of digits (0-9) and commas thereafter.

3rd line: Removes the leading $ and any commas in the numeric value.

I don't know off-hand, but I want to say that JavaScript supports grouping, and it may be possible to nab just the numeric value with commas in the match statement, simplifying the replace statement to just remove the commas.

var str="$4,320 t3444ext/followme";
var regex = /^\$([0-9,])*/g;
var matchedNum = str.match(regex)[0].replace(/[\$,]/g, '');
Robert Hui