views:

7690

answers:

7

In Python, the where and when of using string concatenation versus string substitution eludes me. As the string concatenation has seen large boosts in performance, is this (becoming more) a stylistic decision rather than a practical one?

For a concrete example, how should one handle construction of flexible URIs:

DOMAIN = 'http://stackoverflow.com'
QUESTIONS = '/questions'

def so_question_uri_sub(q_num):
    return "%s%s/%d" % (DOMAIN, QUESTIONS, q_num)

def so_question_uri_cat(q_num):
    return DOMAIN + QUESTIONS + '/' + str(q_num)

Edit: There have also been suggestions about joining a list of strings and for using named substitution. These are variants on the central theme, which is, which way is the Right Way to do it at which time? Thanks for the responses!

A: 

I use substitution wherever I can. I only use concatenation if I'm building a string up in say a for-loop.

Draemon
"building a string in a for-loop" – often this is a case where you can use ''.join and a generator expression..
John Fouhy
+14  A: 

Concatenation is (significantly) faster according to my machine. But stylistically, I'm willing to pay the price of substitution if performance is not critical. Well, and if I need formatting, there's no need to even ask the question... there's no option but to use interpolation/templating.

>>> import Timeit
>>> def so_q_sub(n):
...  return "%s%s/%d" % (DOMAIN, QUESTIONS, n)
...
>>> so_q_sub(1000)
'http://stackoverflow.com/questions/1000'
>>> def so_q_cat(n):
...  return DOMAIN + QUESTIONS + '/' + str(n)
...
>>> so_q_cat(1000)
'http://stackoverflow.com/questions/1000'
>>> t1 = timeit.Timer('so_q_sub(1000)','from __main__ import so_q_sub')
>>> t2 = timeit.Timer('so_q_cat(1000)','from __main__ import so_q_cat')
>>> t1.timeit(number=10000000)
12.166618871951641
>>> t2.timeit(number=10000000)
5.7813972166853773
>>> t1.timeit(number=1)
1.103492206766532e-05
>>> t2.timeit(number=1)
8.5206360154188587e-06

>>> def so_q_tmp(n):
...  return "{d}{q}/{n}".format(d=DOMAIN,q=QUESTIONS,n=n)
...
>>> so_q_tmp(1000)
'http://stackoverflow.com/questions/1000'
>>> t3= timeit.Timer('so_q_tmp(1000)','from __main__ import so_q_tmp')
>>> t3.timeit(number=10000000)
14.564135316080637

>>> def so_q_join(n):
...  return ''.join([DOMAIN,QUESTIONS,'/',str(n)])
...
>>> so_q_join(1000)
'http://stackoverflow.com/questions/1000'
>>> t4= timeit.Timer('so_q_join(1000)','from __main__ import so_q_join')
>>> t4.timeit(number=10000000)
9.4431309007150048
Vinko Vrsalovic
did you make tests with real large strings (like 100000 chars)?
drnk
+1  A: 

What you want to concatenate/interpolate and how you want to format the result should drive your decision.

  • String interpolation allows you to easily add formatting. In fact, your string interpolation version doesn't do the same thing as your concatenation version; it actually adds an extra forward slash before the q_num parameter. To do the same thing, you would have to write return DOMAIN + QUESTIONS + "/" + str(q_num) in that example.

  • Interpolation makes it easier to format numerics; "%d of %d (%2.2f%%)" % (current, total, total/current) would be much less readable in concatenation form.

  • Concatenation is useful when you don't have a fixed number of items to string-ize.

Also, know that Python 2.6 introduces a new version of string interpolation, called string templating:

def so_question_uri_template(q_num):
    return "{domain}/{questions}/{num}".format(domain=DOMAIN,
                                               questions=QUESTIONS,
                                               num=q_num)

String templating is slated to eventually replace %-interpolation, but that won't happen for quite a while, I think.

Tim Lesher
Well, it'll happen whenever you decide to move to python 3.0. Also, see Peter's comment for the fact that you can do named substitutions with the % operator anyway.
John Fouhy
"Concatenation is useful when you don't have a fixed number of items to string-ize." -- You mean a list/array? In that case, couldn't you just join() them?
strager
"Couldn't you just join() them?" -- Yes (assuming you want uniform separators between items). List and generator comprehensions work great with string.join.
Tim Lesher
"Well, it'll happen whenever you decide to move to python 3.0" -- No, py3k still supports the % operator. The next possible deprecation point is 3.1, so it still has some life in it.
Tim Lesher
+6  A: 

"As the string concatenation has seen large boosts in performance..."

If performance matters, this is good to know.

However, performance problems I've seen have never come down to string operations. I've generally gotten in trouble with I/O, sorting and O(n2) operations being the bottlenecks.

Until string operations are the performance limiters, I'll stick with things that are obvious. Mostly, that's substitution when it's one line or less, concatenation when it makes sense, and a template tool (like Mako) when it's large.

S.Lott
+3  A: 

Don't forget about named substitution:

def so_question_uri_namedsub(q_num):
    return "%(domain)s%(questions)s/%(q_num)d" % locals()
too much php
+1  A: 

Remember, stylistic decisions are practical decisions, if you ever plan on maintaining or debugging your code :-) There's a famous quote from Knuth (possibly quoting Hoare?): "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

As long as you're careful not to (say) turn a O(n) task into an O(n2) task, I would go with whichever you find easiest to understand..

John Fouhy
+5  A: 

Be wary of concatenating strings in a loop! The cost of string concatenation is proportional to the length of the result. Looping leads you straight to the land of N-squared. Some languages will optimize concatenation to the most recently allocated string, but it's risky to count on the compiler to optimize your quadratic algorithm down to linear. Best to use the primitive (join?) that takes an entire list of strings, does a single allocation, and concatenates them all in one go.

Norman Ramsey
That's not current. In latest versions of python, a hidden string buffer is created when you concatenate strings in a loop.
Seun Osewa
@Seun: Yes, as I said, some languages will optimize, but it's a risky practice.
Norman Ramsey