views:

118

answers:

5

I have a string that looks like '%s in %s' and I want to know how to seperate the arguments so that they are two different %s. My mind coming from Java came up with this:

'%s in %s' % unicode(self.author),  unicode(self.publication)

But this doesn't work so how does it look in Python?

+9  A: 

If you're using more than one argument it has to be in a tuple (note the extra parentheses):

'%s in %s' % (unicode(self.author),  unicode(self.publication))

As EOL points out, the unicode() function usually assumes ascii encoding as a default, so if you have non-ASCII characters, it's safer to explicitly pass the encoding:

'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))

And as of Python 3.0, it's preferred to use the str.format() syntax instead:

'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))
Mark Cidade
This fails when self.author is not encoded in the default encoding, which can be ASCII (see my answer for an example and a solution), for instance when the author name contains accented character (which is the point of using unicode in the first place).
EOL
I incorporated your suggestions into my answer—thanks!
Mark Cidade
+3  A: 

On a tuple/mapping object for multiple argument format

The following is excerpt from the documentation:

Given format % values, % conversion specifications in format are replaced with zero or more elements of values. The effect is similar to the using sprintf() in the C language.

If format requires a single argument, values may be a single non-tuple object. Otherwise, values must be a tuple with exactly the number of items specified by the format string, or a single mapping object (for example, a dictionary).

References


On str.format instead of %

A newer alternative to % operator is to use str.format. Here's an excerpt from the documentation:

str.format(*args, **kwargs)

Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.

This method is the new standard in Python 3.0, and should be preferred to % formatting.

References


Examples

Here are some usage examples:

>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond

See also

polygenelubricants
I have no way to test this (I don't know Python that much), but the examples seems to suggest that something like `'{self.author} in {self.publication}'.format(self=self)` should "work". I'm just not sure about the whole `unicode` thing.
polygenelubricants
Yes, you can indeed access attributes (and also indices). See http://docs.python.org/library/string.html#formatstringsSo in your example you could have used `{first[0]}` to get the initial `J`.
Duncan
+8  A: 

Mark Cidade's answer is right - you need to supply a tuple.

However from Python 2.6 onwards you can use format instead of %:

'{0} in {1}'.format(unicode(self.author),  unicode(self.publication))

Usage of % for formatting strings is no longer encouraged.

This method of string formatting is the new standard in Python 3.0, and should be preferred to the % formatting described in String Formatting Operations in new code.

Mark Byers
This fails when the default encoding is ASCII and the author name contains accented characters! My answer contains an example and a solution.
EOL
+1  A: 

For python2 you can also do this

'%(author)s in %(publication)s'%{'author':unicode(self.author),
                                  'publication':unicode(self.publication)}

which is handy if you have a lot of arguments to substitute (particularly if you are doing internationalisation)

Python2.6 onwards supports .format()

'{author} in {publication}'.format(author=self.author,
                                   publication=self.publication)
gnibbler
+1  A: 

As far as I can tell, there is a significant problem with some of the answers posted so far: unicode() decodes from the default encoding, which is often ASCII. Thus, the following code, which is essentially what is recommended by previous answers, fails on my machine:

# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))

gives:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

The failure comes from the fact that author does not contain only ASCII bytes (i.e. with values in [0; 127]), and unicode() decodes from ASCII by default (on many machines).

A robust solution is to explicitly give the encoding used in your fields; taking UTF-8 as an example:

u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))

(or without the initial u, depending on whether you want a Unicode result or a byte string).

At this point, one might want to consider having the author and publication fields be Unicode strings, instead of decoding them during formatting.

EOL