views:

137

answers:

2

The title really says it all.

I read somewhere (an SO post, I think, and probably somewhere else, too), that Python automatically references single character strings, so not only does 'a' == 'a', but 'a' is 'a'.

However, I can't remember reading if this is guaranteed behavior in Python, or is it just implementation specific?

Bonus points for official sources ;)

+11  A: 

It's implementation specific. It's difficult to tell, because (as the reference says):

... for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed.

The interpreter's pretty good about ensuring they're identical, but it doesn't always work:

x = u'a'
y = u'abc'[:1]
print x == y, x is y

Run on CPython 2.6, this gives True False.

Chris B.
I don't know for sure, but I would never rely on it.
delnan
That's why I'm looking for an official (or at least very reputable) source :P
Wayne Werner
The example given in this answer proves that you *can't* rely on it. +1
Mark Ransom
Interesting - it gives False for unicode, but not for regular strings (at least in my interactive interpreter). But I find this explanation satisfactory.
Wayne Werner
The CPython implementation has been specialcasing single-character strings for a long time, since it's relatively cheap and easy to do (as there are only 256 of them.) For unicode strings, there's 1114111 distinct single-character strings, so it doesn't do that.
Thomas Wouters
+5  A: 

It is all implementation defined.

The documentation for intern says: "Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys."

That means that anything that could be a name and which is known at compile time is likely (but not guaranteed) to be the same as any other occurrences of the same name.

Other strings aren't stated to be interned. Constant strings appearing in the same compilation unit are folded together (but that is also just an implementation detail) so you get:

>>> a = '!'
>>> a is '!'
False
>>> a = 'a'
>>> a is 'a'
True
>>>

The string that contains an identifier is interned so even in different compilations you get the same string. The string that is not an identifier is only shared when in the same compilation unit:

>>> '!' is '!'
True
Duncan
That is mind blowing!
theycallmemorty