views:

467

answers:

4

I came across this question about memory management of dictionaries, which mentions the intern function. What exactly does it do, and when would it be used?

To give an example:

If I have a set called seen, that contains tuples in the form (string1,string2), which I use to check for duplicates, would storing (intern(string1),intern(string2)) improve performance w.r.t. memory or speed?

+1  A: 

It returns a canonical instance of the string.

Therefore if you have many string instances that are equal you save memory, and in addition you can also compare canonicalized strings by identity instead of equality which is faster.

flybywire
+6  A: 
intern(string)

Looks for string in a table of "interned" strings. If the string is already present in this table, the already interned copy of the string is returned. Otherwise, the new string is added to the table, and then returned.

Interning strings is useful to gain a little performance on dictionary lookup -- if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

Source: http://pyref.infogami.com/intern

Robert Smith
+4  A: 

They weren't talking about keyword intern because there is no such thing in Python. They were talking about non-essential buil-in function intern. Which in py3k has been moved to sys.intern. Docs have an exhaustive description.

SilentGhost
Thanks for pointing that out, fixed.
pufferfish
+2  A: 

Essentially intern looks up (or stores if not present) the string in a collection of interned strings, so all interned instances will share the same identity. You trade the one-time cost of looking up this string for faster comparisons (the compare can return True after just checking for identity, rather than having to compare each character), and reduced memory usage.

However, python will automatically intern strings that are small, or look like identifiers, so you may find you get no improvement because your strings are already being interned behind the scenes. For example:

>>> a = 'abc'; b = 'abc'
>>> a is b
True

In the past, one disadvantage was that interned strings were permanent. Once interned, the string memory was never freed even after all references were dropped. I think this is no longer the case for more recent vesions of python though.

Brian