views:

1628

answers:

13

In spirit of the existing "what's your most useful C/C++ snippet" - thread:

Do you guys have short, monofunctional Python snippets that you use (often) and would like to share with the StackOverlow Community? Please keep the entries small (under 25 lines maybe?) and give only one example per post.

I'll start of with a short snippet i use from time to time to count sloc (source lines of code) in python projects:

# prints recursive count of lines of python source code from current directory
# includes an ignore_list. also prints total sloc

import os
cur_path = os.getcwd()
ignore_set = set(["__init__.py", "count_sourcelines.py"])

loclist = []

for pydir, _, pyfiles in os.walk(cur_path):
    for pyfile in pyfiles:
        if pyfile.endswith(".py") and pyfile not in ignore_set:
            totalpath = os.path.join(pydir, pyfile)
            loclist.append( ( len(open(totalpath, "r").read().splitlines()),
                               totalpath.split(cur_path)[1]) )

for linenumbercount, filename in loclist: 
    print "%05d lines in %s" % (linenumbercount, filename)

print "\nTotal: %s lines (%s)" %(sum([x[0] for x in loclist]), cur_path)
+2  A: 

Hardlink identical files in current directory (on unix, this means they have share physical storage, meaning much less space):

import os
import hashlib

dupes = {}

for path, dirs, files in os.walk(os.getcwd()):
    for file in files:
        filename = os.path.join(path, file)
        hash = hashlib.sha1(open(filename).read()).hexdigest()
        if hash in dupes:
            print 'linking "%s" -> "%s"' % (dupes[hash], filename)
            os.rename(filename, filename + '.bak')
            try:
                os.link(dupes[hash], filename)
                os.unlink(filename + '.bak')
            except:
                os.rename(filename + '.bak', filename)
            finally:
        else:
            dupes[hash] = filename
sysrqb
+1, though this code can be improved upon, by using a unique temporary filename, instead of blindly assuming `filename.bak` doesn't exists.
Stephan202
+3  A: 

I like this one to zip everything up in a directory. Hotkey it for instabackups!

import zipfile

z = zipfile.ZipFile('my-archive.zip', 'w', zipfile.ZIP_DEFLATED)
startdir = "/home/johnf"
for dirpath, dirnames, filenames in os.walk(startdir):
  for filename in filenames:
    z.write(os.path.join(dirpath, filename))
z.close()
John Feminella
What's wrong with zip -r my-archive.zip directory/ ?
sysrqb
That's not a Python snippet. :) (Also, you can include special logic in the snippet that might be complicated to do with shell commands.)
John Feminella
+1 (althogh I prefer `tarfile`)
David X
+14  A: 

The only 'trick' I know that really wowed me when I learned it is enumerate. It allows you to have access to the indexes of the elements within a for loop.

>>> l = ['a','b','c','d','e','f']
>>> for (index,value) in enumerate(l):
...     print index, value
... 
0 a
1 b
2 c
3 d
4 e
5 f
theycallmemorty
Its quite amusing when Java/C programmers always have the indexes with for(n=0;n<l.length;n++)
Unknown
No need to put index, value in parentheses. Also, the above comment is naive/ignorant.
FogleBird
+10  A: 

Initializing a 2D list

While this can be done safely to initialize a list:

lst = [0] * 3

The same trick won’t work for a 2D list (list of lists):

>>> lst_2d = [[0] * 3] * 3
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [5, 0, 0], [5, 0, 0]]

The operator * duplicates its operands, and duplicated lists constructed with [] point to the same list. The correct way to do this is:

>>> lst_2d = [[0] * 3 for i in xrange(3)]
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [0, 0, 0], [0, 0, 0]]
Eli Bendersky
Faster way: http://stackoverflow.com/questions/2332919
gnibbler
+2  A: 

To find out if line is empty (i.e. either size 0 or contains only whitespace), use the string method strip in a condition, as follows:

if not line.strip():    # if line is empty
    continue            # skip it
Eli Bendersky
+2  A: 

Suppose you have a list of items, and you want a dictionary with these items as the keys. Use fromkeys:

>>> items = ['a', 'b', 'c', 'd']
>>> idict = dict().fromkeys(items, 0)
>>> idict
{'a': 0, 'c': 0, 'b': 0, 'd': 0}
>>>

The second argument of fromkeys is the value to be granted to all the newly created keys.

Eli Bendersky
fromkeys is a static method. You should do "dict.fromkeys(items, 0)". Your code creates and throws away an empty dictionary.
Andrew Dalke
@Andrew Dalke , I believe `dict.fromkeys` is a class-method. reason: `dict.fromkeys` returns a dictionary back, hence it _should_ get `class` as its first argument. Think about when you've subclassed `dict` -- `MyDict.fromkeys` should give an instance of `MyDict`
jeffjose
@jeffjose: You are correct. I did the test you suggested and looked at the code since I was curious how that was done.
Andrew Dalke
+8  A: 

zip(*iterable) transposes an iterable.

>>> a=[[1,2,3],[4,5,6]]
>>> zip(*a)
    [(1, 4), (2, 5), (3, 6)]

It's also useful with dicts.

>>> d={"a":1,"b":2,"c":3}
>>> zip(*d.iteritems())
[('a', 'c', 'b'), (1, 3, 2)]
AKX
+1  A: 

For list comprehensions that need current, next:

[fun(curr,next) 
 for curr,next 
 in zip(list,list[1:].append(None)) 
 if condition(curr,next)]

For circular list zip(list,list[1:].append(list[0])).

For previous, current: zip([None].extend(list[:-1]),list) circular: zip([list[-1]].extend(list[:-1]),list)

vartec
A (slight adjustment of) the `pairwise` recipe does the same, and works for all iterables: http://docs.python.org/3.0/library/itertools.html#recipes
Stephan202
+4  A: 

Huge speedup for nested list and dictionaries with:

deepcopy = lambda x: cPickle.loads(cPickle.dumps(x))
vartec
I've always been leery of this technique, although it seems like it should work about as fast as anything else I could think of. do pythonistas consider this a good way to get deep copies? (fwiw, i use this technique anyway)
TokenMacGuy
+5  A: 

To flatten a list of lists, such as

[['a', 'b'], ['c'], ['d', 'e', 'f']]

into

['a', 'b', 'c', 'd', 'e', 'f']

use

[inner
    for outer in the_list
        for inner in outer]
George V. Reilly
+13  A: 

I like using any and a generator:

if any(pred(x.item) for x in sequence):
    ...

instead of code written like this:

found = False
for x in sequence:
    if pred(x.n):
        found = True
if found:
    ...

I first learned of this technique from a Peter Norvig article.

Jacob Gabrielson
+1 For the reference to Norvig's sudoku article. It's very nice.
Stephan202
There's also all() to check that all items are True.
FogleBird
+1  A: 

For Python 2.4+ or earlier:

for x,y in someIterator:
  listDict.setdefault(x,[]).append(y)

In Python 2.5+ there is alternative using defaultdict.

vartec
+1  A: 

A custom list that when multiplied by other list returns a cartesian product... the good thing is that the cartesian product is indexable, not like that of itertools.product (but the multiplicands must be sequences, not iterators).

import operator

class mylist(list):
    def __getitem__(self, args):
     if type(args) is tuple:
      return [list.__getitem__(self, i) for i in args]
     else:
      return list.__getitem__(self, args)
    def __mul__(self, args):
     seqattrs = ("__getitem__", "__iter__", "__len__")
     if all(hasattr(args, i) for i in seqattrs):
      return cartesian_product(self, args)
     else:
      return list.__mul__(self, args)
    def __imul__(self, args):
     return __mul__(self, args)
    def __rmul__(self, args):
     return __mul__(args, self)
    def __pow__(self, n):
     return cartesian_product(*((self,)*n))
    def __rpow__(self, n):
     return cartesian_product(*((self,)*n))

class cartesian_product:
    def __init__(self, *args):
     self.elements = args
    def __len__(self):
     return reduce(operator.mul, map(len, self.elements))
    def __getitem__(self, n):
     return [e[i] for e, i  in zip(self.elements,self.get_indices(n))]
    def get_indices(self, n):
     sizes = map(len, self.elements)
     tmp = [0]*len(sizes)
     i = -1
     for w in reversed(sizes):
      tmp[i] = n % w
      n /= w
      i -= 1
     return tmp
    def __add__(self, arg):
     return mylist(map(None, self)+mylist(map(None, arg)))
    def __imul__(self, args):
     return mylist(self)*mylist(args)
    def __rmul__(self, args):
     return mylist(args)*mylist(self)
    def __mul__(self, args):
     if isinstance(args, cartesian_product):
      return cartesian_product(*(self.elements+args.elements))
     else:
      return cartesian_product(*(self.elements+(args,)))
    def __iter__(self):
     for i in xrange(len(self)):
      yield self[i]
    def __str__(self):
     return "[" + ",".join(str(i) for i in self) +"]"
    def __repr__(self):
     return "*".join(map(repr, self.elements))
fortran
I don't understand a line of it, can you comment on how it works?
bodacydo