views:

656

answers:

2
>>> x = "google"
>>> x is "google"
True
>>> x = "google.com"
>>> x is "google.com"
False
>>>

Can someone give me some hints why its like that?

Edit: to make sure above, I have just tested on python 2.5.4, 2.6.5, 2.7b2, python 3.1 on windows and python 2.7b1 on linux

Looks its consistence across all, so its by design and Am I missing something?

I just find it out that from some of my personal domain filtering script failing with that.

+52  A: 

is verifies object identity, and any implementation of Python, when it meets literal of immutable types, is perfectly free to either make a new object of that immutable type, or seek through existing objects of that type to see if some of them could be reused (by adding a new reference to the same underlying object). This is a pragmatic choice of optimization and not subject to semantic constraints, so your code should never rely on which path a give implementation may take (or it could break with a bugfix/optimization release of Python!).

Consider for example:

>>> import dis
>>> def f():
...   x = 'google.com'
...   return x is 'google.com'
... 
>>> dis.dis(f)
  2           0 LOAD_CONST               1 ('google.com')
              3 STORE_FAST               0 (x)

  3           6 LOAD_FAST                0 (x)
              9 LOAD_CONST               1 ('google.com')
             12 COMPARE_OP               8 (is)
             15 RETURN_VALUE    

so in this particular implementation, within a function, your observation does not apply and only one object is made for the literal (any literal), and, indeed:

>>> f()
True

Pragmatically that's because within a function making a pass through the local table of constants (to save some memory by not making multiple constant immutable objects where one suffices) is pretty cheap and fast, and may offer good performance returns since the function may be called repeatedly afterwards.

But, the very same implementation, at the interactive prompt (Edit: I originally thought this would also happen at a module's top level, but a comment by @Thomas set me right, see later):

>>> x = 'google.com'
>>> y = 'google.com'
>>> id(x), id(y)
(4213000, 4290864)

does NOT bother trying to save memory that way -- the ids are different, i.e., distinct objects. There are potentially higher costs and lower returns and so the heuristics of this implementation's optimizer tell it to not bother searching and just go ahead.

Edit: at module top level, per @Thomas' observation, given e.g.:

$ cat aaa.py
x = 'google.com'
y = 'google.com'
print id(x), id(y)

again we see the table-of-constants-based memory-optimization in this implementation:

>>> import aaa
4291104 4291104

(end of Edit per @Thomas' observation).

Lastly, again on the same implementation:

>>> x = 'google'
>>> y = 'google'
>>> id(x), id(y)
(2484672, 2484672)

the heuristics are different here because the literal string "looks like it might be an identifier" -- so it might be used in operation requiring interning... so the optimizer interns it anyway (and once interned, looking for it becomes very fast of course). And indeed, surprise surprise...:

>>> z = intern(x)
>>> id(z)
2484672

...x has been interned the very first time (as you see, the return value of intern is the same object as x and y, as it has the same id()). Of course, you shouldn't rely on this either -- the optimizer doesn't have to intern anything automatically, it's just an optimization heuristic; if you need interned string, intern them explicitly, just to be safe. When you do intern strings explicitly...:

>>> x = intern('google.com')
>>> y = intern('google.com')
>>> id(x), id(y)
(4213000, 4213000)

...then you do ensure exactly the same object (i.e., same id()) results each and every time -- so you can apply micro-optimizations such as checking with is rather than == (I've hardly ever found the miniscule performance gain to be worth the bother;-).

Edit: just to clarify, here are the kind of performance differences I'm talking about, on a slow Macbook Air...:

$ python -mtimeit -s"a='google';b='google'" 'a==b'
10000000 loops, best of 3: 0.132 usec per loop
$ python -mtimeit -s"a='google';b='google'" 'a is b'
10000000 loops, best of 3: 0.107 usec per loop
$ python -mtimeit -s"a='goo.gle';b='goo.gle'" 'a==b'
10000000 loops, best of 3: 0.132 usec per loop
$ python -mtimeit -s"a='google';b='google'" 'a is b'
10000000 loops, best of 3: 0.106 usec per loop
$ python -mtimeit -s"a=intern('goo.gle');b=intern('goo.gle')" 'a is b'
10000000 loops, best of 3: 0.0966 usec per loop
$ python -mtimeit -s"a=intern('goo.gle');b=intern('goo.gle')" 'a == b'
10000000 loops, best of 3: 0.126 usec per loop

...a few tens of nanoseconds either way, at most. So, worth even thinking about only in the most extreme "optimize the [expletive deleted] out of this [expletive deleted] performance bottleneck" situations!-)

Alex Martelli
Looks like I am misusing **is** operator
S.Mark
@S.Mark, possibly, but not necessarily -- see my edit about interning. You should normally use `is` only on mutables like lists, and singletons like `None`, but if you've ensured interning then (as a truly microscopic optimization) you could also use it there (interning also makes `==` checks a wee bit faster though, so you may not need to insert `is` even if you **do** religiously intern all the relevant strings!-).
Alex Martelli
Alex, you say "at the module's top level (or the interactive prompt)", but I believe what you describe (and what the OP saw) *only* happens at the interactive prompt -- the module's top level is still compiled into a single code object, and all references to the same constant throughout that code object use the same reference.
Thomas Wouters
@Thomas, you're right -- I had not looked at those specific internals in too long. Thanks! Let me edit to fix...
Alex Martelli
Thanks Alex, and Thomas
S.Mark
+5  A: 

"is" is an identity test. Python has some caching behavior for small integers and (apparently) strings. "is" is best used for singleton testing (ex. None).

>>> x = "google"
>>> x is "google"
True
>>> id(x)
32553984L
>>> id("google")
32553984L
>>> x = "google.com"
>>> x is "google.com"
False
>>> id(x)
32649320L
>>> id("google.com")
37787888L
Jeremy Brown