views:

282

answers:

6

Using Python I need to insert a newline character into a string every 64 characters. In Perl it's easy:

s/(.{64})/$1\n/

How could this be done using regular expressions in Python? Is there a more pythonic way to do it?

+5  A: 

Same as in Perl, but with a backslash instead of the dollar for accessing groups:

s = "0123456789"*100 # test string
import re
print re.sub("(.{64})", "\\1\n", s, re.DOTALL)

re.DOTALL is the equivalent to Perl's s/ option.

AndiDog
+7  A: 

without regexp:

def insert_newlines(string, every=64):
    lines = []
    for i in xrange(0, len(string), every):
        lines.append(string[i:i+every])
    return '\n'.join(lines)

shorter but less readable (imo):

def insert_newlines(string, every=64):
    return '\n'.join(string[i:i+every] for i in xrange(0, len(string), every))
gurney alex
This solution is an order of magnitude faster. `timeit.timeit` says 100000 executions of the regexp solution take 27.8 seconds, while 100000 executions of your shorter solution take 4.45 seconds).
badp
I'd also say that the shorter version is the "most pythonic" one: a one-liner, quite elegant, using generators and the `join` method.
Philipp
+4  A: 

I'd go with:

import textwrap
s = "0123456789"*100
print '\n'.join(textwrap.wrap(s, 64))
J.F. Sebastian
`textwrap` is space aware, so this won't handle `"12345 "*100` correctly.
badp
A: 

taking @J.F. Sebastian's solution one step further, and this is nearly criminal :-)

import textwrap
s = "0123456789"*100
print textwrap.fill(s, 64)

look ma... no regexes! because as you know... http://regex.info/blog/2006-09-15/247

thanks for introducing us to textwrap module... although it's been in Python since 2.3, i've never been aware of it until now (yes, i'll admit that publically)!!

wescpy
`textwrap` is space aware, so this won't handle `"12345 "*100` correctly.
badp
transform (safely) and back? thx tho!
wescpy
+1  A: 

Tiny, not nice:

"".join(s[i:i+64] + "\n" for i in xrange(0,len(s),64))
Michael
+1  A: 

I suggest the following method:

"\n".join(re.findall("(?s).{,64}", s))[:-1]

This is, more-or-less, the non-RE method taking advantage of the RE engine for the loop.

On a very slow computer I have as a home server, this gives:

$ python -m timeit -s 's="0123456789"*100; import re' '"\n".join(re.findall("(?s).{,64}", s))[:-1]'
10000 loops, best of 3: 130 usec per loop

AndiDog's method:

$ python -m timeit -s "s='0123456789'*100; import re" 're.sub("(?s)(.{64})", r"\1\n", s)'
1000 loops, best of 3: 800 usec per loop

gurney alex's 2nd/Michael's method:

$ python -m timeit -s "s='0123456789'*100" '"\n".join(s[i:i+64] for i in xrange(0, len(s), 64))'
10000 loops, best of 3: 148 usec per loop

I don't consider the textwrap method to be correct for the specification of the question, so I won't time it.

EDIT

Changed answer because it was incorrect (shame on me!)

EDIT 2

Just for the fun of it, the RE-free method using itertools. It rates third in speed, and it's not Pythonic (too lispy):

"\n".join(
   it.imap(
     s.__getitem__,
     it.imap(
       slice,
       xrange(0, len(s), 64),
       xrange(64, len(s)+1, 64)
     )
   )
 )

$ python -m timeit -s 's="0123456789"*100; import itertools as it' '"\n".join(it.imap(s.__getitem__, it.imap(slice, xrange(0, len(s), 64), xrange(64, len(s)+1, 64))))'
10000 loops, best of 3: 182 usec per loop
ΤΖΩΤΖΙΟΥ
This doesn't seem to work correctly. If you put more than 64 characters in it outputs a newline after every character rather than after every 64 characters.
bignum
Spot on, biffabacon. Fixed it.
ΤΖΩΤΖΙΟΥ
Your answer is very helpful TZΩΤΖΙΟΥ. Many thanks.
bignum