views:

465

answers:

8

I would like to add a certain number of leading zeroes (say up to 3) to all numbers of a string. For example:

Input: /2009/5/song 01 of 12

Output: /2009/0005/song 0001 of 0012

What's the best way to do this with regular expressions?

Edit:

I picked the first correct answer. However, all answers are worth giving a read.

+2  A: 

An sample

>>> re.sub("(?<!\d)0*(\d{1,3})(?!\d)","000\\1","/2009/5/song 01 of 3")
'/2009/0005/song 0001 of 0003'

Note:

  • only works for numbers 1 - 9 for now
  • not well test yet

Edit:

Can't think of single regex without using callbacks for now* (there might have way to do it)

Here is 2 regex to process that

>>> x = "1/2009/5/song 01 of 3 10 100 010 120 1200 abcd"
>>>
>>> x = re.sub("(?<!\d)0*(\d{1,3})(?!\d)","000\\1",x)
#'0001/2009/0005/song 0001 of 0003 00010 000100 00010 000120 1200 abcd'
>>>
>>> re.sub("0+(\d{4})(?!\d)","\\1",x) #strip extra leading zeroes
'0001/2009/0005/song 0001 of 0003 0010 0100 0010 0120 1200 abcd'
S.Mark
Not working for 011, for example.
kovpas
@kovpas, yeah, under thinking. thats a bit difficult without using callbacks, because filling 0 is variable length.
S.Mark
Thanks for the swift reply, S.Mark. Changed the example to challenge your solution.
hgpc
I added 2 regexes but I will try to get it with single regex, there should be way to do it.
S.Mark
looks like my regex skill failed to achieve that. couldn't figure out :(
S.Mark
+3  A: 

Use something that supports a callback so you can process the match:

>>> r=re.compile(r'(?:^|(?<=[^0-9]))([0-9]{1,3})(?=$|[^0-9])')
>>> r.sub(lambda x: '%04d' % (int(x.group(1)),), 'dfbg345gf345', sys.maxint)
'dfbg0345gf0345'
>>> r.sub(lambda x: '%04d' % (int(x.group(1)),), '1x11x111x', sys.maxint)
'0001x0011x0111x'
>>> r.sub(lambda x: '%04d' % (int(x.group(1)),), 'x1x11x111x', sys.maxint)
'x0001x0011x0111x'
Ignacio Vazquez-Abrams
Thanks Ignacio. Not possible without using callbacks?
hgpc
It may be possible with a sufficiently-complex regular expression or with multiple regular expressions, but unless function calls are ludicrously expensive this method will be faster.
Ignacio Vazquez-Abrams
+1  A: 

Another approch

>>> x
'/2009/5/song 01 of 12'
>>> ''.join([i.isdigit() and i.zfill(4) or i for i in re.split("(?<!\d)(\d+)(?!\d)",x)])
'/2009/0005/song 0001 of 0012'
>>>

or

>>> x
'/2009/5/song 01 of 12'
>>> r=re.split("(?<!\d)(\d+)(?!\d)",x)
>>> ''.join(a+b.zfill(4) for a,b in zip(r[::2],r[1::2]))
'/2009/0005/song 0001 of 0012'
S.Mark
+6  A: 

In Perl:

s/([0-9]+)/sprintf('%04d',$1)/ge;

Benjamin Franz
+1 That is a gorgeously simple regex and does fit the bill.
drewk
Makes me want to learn Pearl after comparing it to my Java implementation.
hgpc
A: 

If you regular expression implementation does not support look-behind and/or look-ahead assertions, you can also use this regular expression:

(^|\D)\d{1,3}(\D|$)

And replace the match with $1 + padLeft($2, 4, "0") + $3 where $1 is the match of the first group and padLeft(str, length, padding) is a function that prefixes str with padding until the length length is reached.

Gumbo
A: 

Here is a Perl solution without callbacks or recursion. It does use the Perl regex extension of execution of code in lieu of the straight substitution (the e switch) but this is very easily extended to other languages that lack that construct.

#!/usr/bin/perl

while (<DATA>) {
   chomp;
   print "string:\t\t\t$_\n";
# uncomment if you care about 0000000 case:
#   s/(^|[^\d])0+([\d])/\1\2/g;
#   print "now no leading zeros:\t$_\n";    
   s/(^|[^\d]{1,3})([\d]{1,3})($|[^\d]{1,3})/sprintf "%s%04i%s",$1,$i=$2,$3/ge;
   print "up to 3 leading zeros:\t$_\n";
}
print "\n";

__DATA__
/2009/5/song 01 of 12
/2010/10/song 50 of 99
/99/0/song 1 of 1000
1
01
001
0001
/001/
"02"
0000000000

Output:

string:                /2009/5/song 01 of 12
up to 3 leading zeros:  /2009/0005/song 0001 of 0012
string:                /2010/10/song 50 of 99
up to 3 leading zeros:  /2010/0010/song 0050 of 0099
string:                /99/0/song 1 of 1000
up to 3 leading zeros:  /0099/0/song 0001 of 1000
string:                1
up to 3 leading zeros:  0001
string:                01
up to 3 leading zeros:  0001
string:                001
up to 3 leading zeros:  0001
string:                0001
up to 3 leading zeros:  0001
string:                /001/
up to 3 leading zeros:  /0001/
string:                "02"
up to 3 leading zeros:  "0002"
string:                0000000000
up to 3 leading zeros:  0000000000
drewk
+1  A: 

<warning> This assumes academic interest, of course you should use callbacks to do it clearly and correctly </warning>

I'm able to abuse regular expressions to have 2 leading zeros (.net flavor):

s = Regex.Replace(s, @".(?=\b\d\b)|(?=\b\d{1,2}\b)", "$&0");

Doesn't work if there's a number on the beginning of the string. This works by matching the 0-width before a number or the character before a number, and replacing them with 0.
I had no luck expanding it to 3 leading zeros, and certainly not more.

Kobi
A: 

Using c#:

string result = Regex.Replace(input, @"\d+", me =>
{
    return int.Parse(me.Value).ToString("0000");
});
Alex