ansaurus

Question

Answer 1

+2 A:

Hardlink identical files in current directory (on unix, this means they have share physical storage, meaning much less space):

import os
import hashlib

dupes = {}

for path, dirs, files in os.walk(os.getcwd()):
    for file in files:
        filename = os.path.join(path, file)
        hash = hashlib.sha1(open(filename).read()).hexdigest()
        if hash in dupes:
            print 'linking "%s" -> "%s"' % (dupes[hash], filename)
            os.rename(filename, filename + '.bak')
            try:
                os.link(dupes[hash], filename)
                os.unlink(filename + '.bak')
            except:
                os.rename(filename + '.bak', filename)
            finally:
        else:
            dupes[hash] = filename

sysrqb 2009-03-28 01:10:48

+1, though this code can be improved upon, by using a unique temporary filename, instead of blindly assuming `filename.bak` doesn't exists.

Stephan202 2009-07-12 13:47:31

Answer 2

+3 A:

I like this one to zip everything up in a directory. Hotkey it for instabackups!

import zipfile

z = zipfile.ZipFile('my-archive.zip', 'w', zipfile.ZIP_DEFLATED)
startdir = "/home/johnf"
for dirpath, dirnames, filenames in os.walk(startdir):
  for filename in filenames:
    z.write(os.path.join(dirpath, filename))
z.close()

John Feminella 2009-03-28 01:13:34

What's wrong with zip -r my-archive.zip directory/ ?

sysrqb 2009-03-28 01:14:52

That's not a Python snippet. :) (Also, you can include special logic in the snippet that might be complicated to do with shell commands.)

John Feminella 2009-03-28 01:18:44

+1 (althogh I prefer `tarfile`)

David X 2010-08-04 18:17:21

Answer 3

+14 A:

The only 'trick' I know that really wowed me when I learned it is enumerate. It allows you to have access to the indexes of the elements within a for loop.

>>> l = ['a','b','c','d','e','f']
>>> for (index,value) in enumerate(l):
...     print index, value
... 
0 a
1 b
2 c
3 d
4 e
5 f

theycallmemorty 2009-03-28 02:16:45

Its quite amusing when Java/C programmers always have the indexes with for(n=0;n<l.length;n++)

Unknown 2009-03-29 06:00:13

No need to put index, value in parentheses. Also, the above comment is naive/ignorant.

FogleBird 2009-07-12 13:50:30

Answer 4

+10 A:

Initializing a 2D list

While this can be done safely to initialize a list:

lst = [0] * 3

The same trick won’t work for a 2D list (list of lists):

>>> lst_2d = [[0] * 3] * 3
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [5, 0, 0], [5, 0, 0]]

The operator * duplicates its operands, and duplicated lists constructed with [] point to the same list. The correct way to do this is:

>>> lst_2d = [[0] * 3 for i in xrange(3)]
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [0, 0, 0], [0, 0, 0]]

Eli Bendersky 2009-03-28 08:35:01

Faster way: http://stackoverflow.com/questions/2332919

gnibbler 2010-02-25 09:56:28

Answer 5

+2 A:

To find out if line is empty (i.e. either size 0 or contains only whitespace), use the string method strip in a condition, as follows:

if not line.strip():    # if line is empty
    continue            # skip it

Eli Bendersky 2009-03-28 08:35:49

Answer 6

+2 A:

Suppose you have a list of items, and you want a dictionary with these items as the keys. Use fromkeys:

>>> items = ['a', 'b', 'c', 'd']
>>> idict = dict().fromkeys(items, 0)
>>> idict
{'a': 0, 'c': 0, 'b': 0, 'd': 0}
>>>

The second argument of fromkeys is the value to be granted to all the newly created keys.

Eli Bendersky 2009-03-28 08:36:40

fromkeys is a static method. You should do "dict.fromkeys(items, 0)". Your code creates and throws away an empty dictionary.

Andrew Dalke 2009-03-28 19:39:19

@Andrew Dalke , I believe `dict.fromkeys` is a class-method. reason: `dict.fromkeys` returns a dictionary back, hence it _should_ get `class` as its first argument. Think about when you've subclassed `dict` -- `MyDict.fromkeys` should give an instance of `MyDict`

jeffjose 2010-03-14 18:41:20

@jeffjose: You are correct. I did the test you suggested and looked at the code since I was curious how that was done.

Andrew Dalke 2010-03-17 14:53:15

Answer 7

+8 A:

zip(*iterable) transposes an iterable.

>>> a=[[1,2,3],[4,5,6]]
>>> zip(*a)
    [(1, 4), (2, 5), (3, 6)]

It's also useful with dicts.

>>> d={"a":1,"b":2,"c":3}
>>> zip(*d.iteritems())
[('a', 'c', 'b'), (1, 3, 2)]

AKX 2009-03-28 18:11:56

Answer 8

+1 A:

For list comprehensions that need current, next:

[fun(curr,next) 
 for curr,next 
 in zip(list,list[1:].append(None)) 
 if condition(curr,next)]

For circular list zip(list,list[1:].append(list[0])).

For previous, current: zip([None].extend(list[:-1]),list) circular: zip([list[-1]].extend(list[:-1]),list)

vartec 2009-03-28 21:32:02

A (slight adjustment of) the `pairwise` recipe does the same, and works for all iterables: http://docs.python.org/3.0/library/itertools.html#recipes

Stephan202 2009-07-12 13:51:37

Answer 9

+4 A:

Huge speedup for nested list and dictionaries with:

deepcopy = lambda x: cPickle.loads(cPickle.dumps(x))

vartec 2009-03-28 21:36:56

I've always been leery of this technique, although it seems like it should work about as fast as anything else I could think of. do pythonistas consider this a good way to get deep copies? (fwiw, i use this technique anyway)

TokenMacGuy 2009-07-12 15:37:59

Answer 10

+5 A:

To flatten a list of lists, such as

[['a', 'b'], ['c'], ['d', 'e', 'f']]

into

['a', 'b', 'c', 'd', 'e', 'f']

use

[inner
    for outer in the_list
        for inner in outer]

George V. Reilly 2009-03-29 01:47:41

Answer 11

+13 A:

I like using any and a generator:

if any(pred(x.item) for x in sequence):
    ...

instead of code written like this:

found = False
for x in sequence:
    if pred(x.n):
        found = True
if found:
    ...

I first learned of this technique from a Peter Norvig article.

Jacob Gabrielson 2009-03-29 05:57:20

+1 For the reference to Norvig's sudoku article. It's very nice.

Stephan202 2009-07-12 13:39:52

There's also all() to check that all items are True.

FogleBird 2009-07-12 13:51:22

Answer 12

+1 A:

For Python 2.4+ or earlier:

for x,y in someIterator:
  listDict.setdefault(x,[]).append(y)

In Python 2.5+ there is alternative using defaultdict.

vartec 2009-03-29 10:46:58

Answer 13

+1 A:

A custom list that when multiplied by other list returns a cartesian product... the good thing is that the cartesian product is indexable, not like that of itertools.product (but the multiplicands must be sequences, not iterators).

import operator

class mylist(list):
    def __getitem__(self, args):
     if type(args) is tuple:
      return [list.__getitem__(self, i) for i in args]
     else:
      return list.__getitem__(self, args)
    def __mul__(self, args):
     seqattrs = ("__getitem__", "__iter__", "__len__")
     if all(hasattr(args, i) for i in seqattrs):
      return cartesian_product(self, args)
     else:
      return list.__mul__(self, args)
    def __imul__(self, args):
     return __mul__(self, args)
    def __rmul__(self, args):
     return __mul__(args, self)
    def __pow__(self, n):
     return cartesian_product(*((self,)*n))
    def __rpow__(self, n):
     return cartesian_product(*((self,)*n))

class cartesian_product:
    def __init__(self, *args):
     self.elements = args
    def __len__(self):
     return reduce(operator.mul, map(len, self.elements))
    def __getitem__(self, n):
     return [e[i] for e, i  in zip(self.elements,self.get_indices(n))]
    def get_indices(self, n):
     sizes = map(len, self.elements)
     tmp = [0]*len(sizes)
     i = -1
     for w in reversed(sizes):
      tmp[i] = n % w
      n /= w
      i -= 1
     return tmp
    def __add__(self, arg):
     return mylist(map(None, self)+mylist(map(None, arg)))
    def __imul__(self, args):
     return mylist(self)*mylist(args)
    def __rmul__(self, args):
     return mylist(args)*mylist(self)
    def __mul__(self, args):
     if isinstance(args, cartesian_product):
      return cartesian_product(*(self.elements+args.elements))
     else:
      return cartesian_product(*(self.elements+(args,)))
    def __iter__(self):
     for i in xrange(len(self)):
      yield self[i]
    def __str__(self):
     return "[" + ",".join(str(i) for i in self) +"]"
    def __repr__(self):
     return "*".join(map(repr, self.elements))

fortran 2009-07-12 13:23:42

I don't understand a line of it, can you comment on how it works?

bodacydo 2010-03-23 14:45:17

ansaurus

tags:

views:

answers:

Short (and useful) python snippets

related questions