views:

3074

answers:

5

Hey there, I'm trying to perform a backwards regular expression search on a string to divide it into groups of 3 digits. As far as i can see from the AS3 documentation, searching backwards is not possible in the reg ex engine.

The point of this exercise is to insert triplet commas into a number like so:

10000000 => 10,000,000

I'm thinking of doing it like so:

string.replace(/(\d{3})/g, ",$1")

But this is not correct due to the search not happening from the back and the replace $1 will only work for the first match.

I'm getting the feeling I would be better off performing this task using a loop.

UPDATE:

Due to AS3 not supporting lookahead this is how I have solved it.

public static function formatNumber(number:Number):String
{
 var numString:String = number.toString()
 var result:String = ''

 while (numString.length > 3)
 {
  var chunk:String = numString.substr(-3)
  numString = numString.substr(0, numString.length - 3)
  result = ',' + chunk + result
 }

 if (numString.length > 0)
 {
  result = numString + result
 }

 return result
}
+1  A: 

This really isn't the best use of RegEx... I'm not aware of a number formatting function, but this thread seems to provide a solution.

function commaCoder(yourNum):String {
    //var yourNum:Number = new Number();
    var numtoString:String = new String();
    var numLength:Number = yourNum.toString().length;
    numtoString = "";

    for (i=0; i<numLength; i++) { 
     if ((numLength-i)%3 == 0 && i != 0) {
      numtoString += ",";
     }
     numtoString += yourNum.toString().charAt(i);
     trace(numtoString);
    }
    return numtoString;
}

If you really are insistent on using RegEx, you could just reverse the string, apply the RegEx replace function, then reverse it back.

Noldorin
I've no special need for a RegEx solution, i was more wondering how it could be approached using regex. But it seems that it is not the sort of problem regex lends itself too, especially with the case of: 100000000 => ,100,000,000. I wouldn't know where to start a regex to that into account
Brian Heylin
But this particular problem *can* be solved with a regex, and without reversing the string first. Niki and toolkit show how it's done.
Alan Moore
@Alan: Indeed it can be done... though please don't advocate it! Saying that, I think the OP understands that it's not a very appropiate use of RegEx.
Noldorin
But how is anyone supposed to learn regexes if not by practicing on small, self-contained problems like this one? It makes for a nice little exercise.
Alan Moore
I suppose it does, as long as one is cautious about their utility. No object there, really. Nonetheless, there are plenty of very practical regexes which one could practice writing.
Noldorin
Why not use a regex? It's extremly easy to test, IMO it's easier to understand than pure Java code (because it's declarative instead of imperative), and the chances of nasty errors in case of malformed data is lower.
Niki
@Niki: It's unnecessarily obfuscated for a start? And also, I would bet on it being *hugely* slower.
Noldorin
It's true that a regex-based solution can never be quite as fast as a hand-coded solution can be (in Java, anyway), but that's no reason to reject regexes without even trying them. In this case, you would have to be processing millions of strings in a tight loop to even notice the difference.
Alan Moore
@Noldorin: 1. Why do you think it's _hugely_ slower? Regexes can be compiled, and I'm sure they avoid typical performance pitfalls like concatenating strings in a loop. 2. It says "look for a digit followed by any number of 3 digit blocks" - I'd say that's a lot clearer than equivalent Java/C# code
Niki
@Niki: Guess it's a matter of personal preference. I just see that the average coder has a tendency to use regexes for almost anything they possibly can, and I've become rather skeptical.
Noldorin
(contd.) At least for me, regexes in most cases take a lot longer to interpret and modify, and I'm fairly experienced with them! I was also aware of lookahead assertions, but thought I'd stick to the simple solution. And nonetheless, I would bet regexes are still slower, even when compiled.
Noldorin
+1  A: 

A sexeger is good for this. In brief, a sexeger is a reversed regex run against a reversed string that you reverse the output of. It is generally more efficient than the alternative. Here is some pseudocode for what you want to do:

string = reverse string
string.replace(/(\d{3})(?!$)/g, "$1,")
string = reverse string

Here is is a Perl implemntation

#!/usr/bin/perl

use strict;
use warnings;

my $s = 13_456_789;

for my $n (1, 12, 123, 1234, 12345, 123456, 1234567) {
    my $s = reverse $n;
    $s =~ s/([0-9]{3})(?!$)/$1,/g;
    $s = reverse $s;
    print "$s\n";
}
Chas. Owens
Thanks Chas, just as a POI, how would I take this situation into account: 100000000 => ,100,000,000. Or is this even possible with regex?
Brian Heylin
Hmm, a zero-width negative look-behind just shifts the position of the comma, and trying to do a normal regex with a zero-width negative look-ahead only works for groups that are multiples of three.
Chas. Owens
I think toolkit has it with a zero-width positive look-ahead
Chas. Owens
As Brian pointed out, your technique puts a comma at the beginning of the string if the first group consists of three digits. I would add a positive lookahead for a digit to make sure I was still inside the number: /(\d{3})(?=\d)/g
Alan Moore
Thanks guys, so in general it seems that a regex solution is going down an overly complex road :D
Brian Heylin
+5  A: 

If your regex engine has positive lookaheads, you could do something like this:

string.replace(/(\d)(?=(\d\d\d)+$)/, "$1,")

Where the positive lookahead (?=...) means that the regex only matches when the lookahead expression ... matches.

(Note that lookaround-expressions are not always very efficient.)

Niki
Great minds think alike ... +1 :-)
toolkit
For ActionScript, you need to add the "g" / global flag:trace("1234567".replace(/(\d)(?=(\d\d\d)+$)/g, "$1,"));
mikechambers
+9  A: 

If your language supports postive lookahead assertions, then I think the following regex will work:

(\d)(?=(\d{3})+$)

Demonstrated in Java:

import static org.junit.Assert.assertEquals;

import org.junit.Test;

public class CommifyTest {

    @Test
    public void testCommify() {
        String num0 = "1";
        String num1 = "123456";
        String num2 = "1234567";
        String num3 = "12345678";
        String num4 = "123456789";

        String regex = "(\\d)(?=(\\d{3})+$)";

        assertEquals("1", num0.replaceAll(regex, "$1,"));
        assertEquals("123,456", num1.replaceAll(regex, "$1,"));
        assertEquals("1,234,567", num2.replaceAll(regex, "$1,"));
        assertEquals("12,345,678", num3.replaceAll(regex, "$1,"));
        assertEquals("123,456,789", num4.replaceAll(regex, "$1,"));    
    }    
}

The following link suggests AS3 does?

toolkit
I prefer this, assuming that you can use lookbehinds:(?<=\d)(?=(\d{3})+$)That way, you can simply replace with "," instead of replacing with "\1,".
Bravery Onions
+1  A: 

This is how I would do it:

public static function formatNumber(number:Number):String
{
    var firstPart:Number = Math.floor(number/1000)
    var lastPart:Number = number%1000
    if (firstPart > 0) {
        return formatNumber(firstPart) + ',' + lastPart.toString()
      } else {
        return lastPart.toString()
    }
}

I think that it should even work for non-integers, e.g. 1234567.34.

Svante