views:

408

answers:

6

Given the following text

/feeds/tag/remote-desktop                               1320  17007    22449240
/feeds/tag/terminal-server                              1328  15805    20989040
/foo/23211/test                                         1490  11341    16898090

Let's say we want to convert those numbers to their comma-fied forms, like so

/feeds/tag/remote-desktop                             1,320  17,007  22,449,240
/feeds/tag/terminal-server                            1,328  15,805  20,989,040
/foo/23211/test                                       1,490  11,341  16,898,090

(don't worry about fixing the fixed-width ASCII spacing, that's a problem for another day)

This is the best regex I could come up with; it's based on this JavaScript regex solution from Regex Ninja Steven Levithan:

return Regex.Replace(s, @"\b(?<!\/)\d{4,}\b(?<!\/)", 
    delegate(Match match) {
        string output = "";
        string m = match.Value;
        int len = match.Length;
        for (int i = len - 1; i >= 0 ; i--)
        {                        
            output = m[i] + output;
            if ((len - i) % 3 == 0) output = "," + output;
        }
        if (output.StartsWith(","))
            output = output.Substring(1, output.Length-1);
        return output;
    });

In a related question, there is a very clever number comma insertion regex proposed:

text = Regex.Replace(text, @"(?<=\d)(?=(\d{3})+$)", ",")

However this requires an end anchor $ which, as you can see, I don't have in the above text -- the numbers are "floating" in the rest of the text.

I suspect there is a cleaner way to do this than my solution? After writing this, I just realized I could combine them, and put one Regex inside the other, like so:

return Regex.Replace(s, @"\b(?<!\/)\d{4,}\b(?<!\/)", 
    delegate(Match match) {
        return Regex.Replace(match.Value, @"(?<=\d)(?=(\d{3})+$)", ",");
    });
+3  A: 

Why not break them up into separate, trimmed fields, then treat each numeric field using the "clever" insertion Regex? This may actually help you with the fixed width formatting as well since you can specify a width using string.Format when you reconstruct the line.

Actually, it may even be easier if you split them to convert to numbers and just use the format specifier to add the commas.

tvanfosson
+14  A: 

Why can't you parse them into long and then use formatted ToString?

CultureInfo ci = new CultureInfo("en-US");
long number = 1234;
Console.WriteLine(number.ToString("N0", ci));
maciejkow
+5  A: 

Why not (inside your delegate):

CultureInfo ci = new CultureInfo("en-US");
string output = int.Parse(match.Value).ToString("N0",ci);

Translation:

  1. Convert to int (or long if need be)
  2. Use .net Numeric Format to properly insert commas
Keltex
You should always provide CultureInfo to string formating functions. Otherwise, result will vary depending on user's default locale. In this case, on my machine, your code would output "1 234" when provided number 1234. Best practice is, that if that's a behavior you're expecting - state it explicitly by using CultureInfo.CurrentUICulture.
maciejkow
@maciejkow I incorporated your suggestion.
Keltex
+2  A: 

In his book "Mastering Regular Expressions" Jeffrey E.F. Friedl gives a nice explanation about this 'classic' commafication problem (to explain the lookaround concept), on page 65 he gives the following Perl code snippet which might be helpful to you:

$string =~ s/(?<=\d)(?=(\d\d\d)+$)/,/g;
thijs
Whoops, I think I read the problem summary too fast to see that the same regex was already mentioned. Anyway, good to know this problem is discussed in MRE as well.
thijs
+6  A: 

I agree in principle with those suggesting that you use the built-in .NET formatting facilities if possible.

However, if your numbers can be arbitrarily large, something like this should work:

int len = match.Length;
int numCommas = (len-1) / 3;
StringBuilder sb = new StringBuilder(match.Value, len + numCommas)
for (int i = 1; i <= numCommas; i++) {
    sb.Insert(len - i * 3, ',');
}
return sb.ToString()

Also, if you insist on using Regex.Replace for whatever reason, you can tweak the Regex that you listed in the question to avoid the end anchor problem. For instance, I think that

Regex.Replace(text, @"(?<=\d)(?=(\d{3})+(\s|$))", ",")

would work in your example since the numbers you want to "comma-fy" are all followed by spaces or the end of the line.

kvb
+1  A: 

Why so complicated?

var text =
@"/feeds/tag/remote-desktop             1320  17007    22449240
/feeds/tag/terminal-server            1328  15805    20989040
/foo/23211/test                       1490  11341    16898090";

var regex = new Regex(@"(?<=\s)\d+");

for (var match = regex.Match(text) ; match.Success ; match = match.NextMatch())
{
    var longValue = long.Parse(match.Value);
    text = text.Replace(match.Value, longValue.ToString("n0"));
}

Console.WriteLine(text);

Which produces:

/feeds/tag/remote-desktop             1,320  17,007    22,449,240
/feeds/tag/terminal-server            1,328  15,805    20,989,040
/foo/23211/test                       1,490  11,341    16,898,090

This has the advantage of using culturally-sensitive formatting for those cultures that use underscores every four characters, rather than commas every three. ;)

If you're worried that long might not be big enough (!) then maybe .NET 4's System.Numerics.BigInteger should do the job.

Damian Powell