First I checked % against backquoting. % is faster. THen I checked % (tuple) against 'string'.format(). An initial bug made me think it was faster. But no. % is faster.
So, you are already doing your massive pile of float-to-string conversions the fastest way you can do it in Python.
The Demo code below is ugly demo code. Please don't lecture me on xrange versus range or other pedantry. KThxBye.
My ad-hoc and highly unscientific testing indicates that (a) % (1.234,) operations on Python 2.5 on linux is faster than % (1.234,...) operation Python 2.6 on linux, for the test code below, with the proviso that the attempt to use 'string'.format() won't work on python versions before 2.6. And so on.
# this code should never be used in production.
# should work on linux and windows now.
import random
import timeit
import os
import tempfile
start = 0
interval = 0.1
amap = [] # list of lists
tmap = [] # list of tuples
def r():
return random.random()*500
for i in xrange(0,10000):
amap.append ( [r(),r(),r(),r(),r(),r()] )
for i in xrange(0,10000):
tmap.append ( (r(),r(),r(),r(),r(),r()) )
def testme_percent():
log_file = tempfile.TemporaryFile()
try:
for qmap in amap:
s = '%g %g %g %g %g %g \n' % (qmap[0], qmap[1], qmap[2], qmap[3], qmap[4], qmap[5])
log_file.write( s)
finally:
log_file.close();
def testme_tuple_percent():
log_file = tempfile.TemporaryFile()
try:
for qtup in tmap:
s = '%g %g %g %g %g %g \n' % qtup
log_file.write( s );
finally:
log_file.close();
def testme_backquotes_rule_yeah_baby():
log_file = tempfile.TemporaryFile()
try:
for qmap in amap:
s = `qmap`+'\n'
log_file.write( s );
finally:
log_file.close();
def testme_the_new_way_to_format():
log_file = tempfile.TemporaryFile()
try:
for qmap in amap:
s = '{0} {1} {2} {3} {4} {5} \n'.format(qmap[0], qmap[1], qmap[2], qmap[3], qmap[4], qmap[5])
log_file.write( s );
finally:
log_file.close();
# python 2.5 helper
default_number = 50
def _xtimeit(stmt="pass", timer=timeit.default_timer,
number=default_number):
"""quick and dirty"""
if stmt<>"pass":
stmtcall = stmt+"()"
ssetup = "from __main__ import "+stmt
else:
stmtcall = stmt
ssetup = "pass"
t = timeit.Timer(stmtcall,setup=ssetup)
try:
return t.timeit(number)
except:
t.print_exc()
# no formatting operation in testme2
print "now timing variations on a theme"
#times = []
#for i in range(0,10):
n0 = _xtimeit( "pass",number=50)
print "pass = ",n0
n1 = _xtimeit( "testme_percent",number=50);
print "old style % formatting=",n1
n2 = _xtimeit( "testme_tuple_percent",number=50);
print "old style % formatting with tuples=",n2
n3 = _xtimeit( "testme_backquotes_rule_yeah_baby",number=50);
print "backquotes=",n3
n4 = _xtimeit( "testme_the_new_way_to_format",number=50);
print "new str.format conversion=",n4
# times.append( n);
print "done"
I think you could optimize your code by building your TUPLES of floats somewhere else, wherever you built that map, in the first place, build your tuple list, and then applying the fmt_string % tuple this way:
for tup in mytups:
log_file.write( fmt_str % tup )
I was able to shave the 8.7 seconds down to 8.5 seconds by dropping the making-a-tuple part out of the for loop. Which ain't much. The big boy there is floating point formatting, which I believe is always going to be expensive.
Alternative:
Have you considered NOT writing such huge logs as text, and instead, saving them using the fastest "persistence" method available, and then writing a short utility to dump them to text, when needed? Some people use NumPy with very large numeric data sets, and it does not seem they would use a line-by-line dump to store their stuff. See:
http://thsant.blogspot.com/2007/11/saving-numpy-arrays-which-is-fastest.html