ansaurus

Question

Answer 1

+1 A:

You can try something like this:

class mylist(list):

    def __sub__(self, b):
        result = self[:]
        b = b[:]
        while b:
            try:
                result.remove(b.pop())
            except ValueError:
                raise Exception("Not all elements found during subtraction")
        return result


a = mylist([0, 1, 2, 1, 0] )
b = mylist([0, 1, 1])

>>> a - b
[2, 0]

You have to define what [1, 2, 3] - [5, 6] should output though, I guess you want [1, 2, 3] thats why I ignore the ValueError.

Edit: Now I see you wanted an exception if a does not contain all elements, added it instead of passing the ValueError.

truppo 2010-01-15 10:03:39

Why are you subclassing list?

Devin Jeanpierre 2010-01-15 10:06:45

The OP states that "This should throw an exception if a does not contain all elements in b," so the `ValueError` shouldn't be silenced.

Pär Wieslander 2010-01-15 10:07:15

@Devin: because the title of this question is "Subtracting two lists in Python"?

truppo 2010-01-15 10:11:31

Apart from ignoring the exception (I actually want the excepton) that seems pretty nice, though I wonder about it's performance. remove is O(n) I suspect.Subclassing list itself is a nice way to keep stuff readible yet not clutter the code too much, hadn't even thought of that.

wich 2010-01-15 10:11:57

remove is O(n), making it potentially quadratic. It could be faster if you changed your data structure-- why are you using a list rather than a dict (mapping to element counts)? As for subclassing list, it doesn't *particularly* remove clutter. Really, how different is sub(a, b) and a - b? The difficulty is that you have to be using mylists everywhere instead of lists, which might be painful to track down. Otherwise, it's generally just bad style. In more complex cases (e.g. overriding \_\_getitem\_\_), behavior is wonky because code is shared in C, not Python, so a lot more work is involved.

Devin Jeanpierre 2010-01-15 10:31:51

Answer 2

+4 A:

I'm not sure what the objection to a for loop is: there is no multiset in Python so you can't use a builtin container to help you out.

Seems to me anything on one line (if possible) will probably be helishly complex to understand. Go for readability and KISS. Python is not C :)

jkp 2010-01-15 10:06:50

Answer 3

+11 A:

I know "for" is not what you want, but it's simple and clear, why bother?

for x in b:
  a.remove(x)

Dyno Fu 2010-01-15 10:11:40

I don't want to destroy the original list, which would add more inline code to do this.

wich 2010-01-15 10:14:04

Heh: yes, and if it *really* must look like a list comprehension:[a.remove(x) for x in b] :p

jkp 2010-01-15 10:14:47

@jkp that would still mutilate the original list and not have a sensible return value

wich 2010-01-15 10:17:18

It will be three lines in total if you add `c = list(a)` before the loop and then remove items from `c`. In my opinion this is probably as clear and readable as it gets.

Pär Wieslander 2010-01-15 10:18:02

@jkp actually, that list comprehension returns `[None, None, None]`

Kimvais 2010-01-15 11:35:28

(Voting up a hopefully-facetious suggestion to misuse list comprehensions?)

Glenn Maynard 2010-01-15 11:44:44

But that's horribly inefficient for large lists, isn't it?

vit 2010-01-15 11:46:06

@vit Yes, efficiency is still a problem here, this'll get up to O(n^2)

wich 2010-01-15 11:59:36

@Kimvais: it does, but `a` will be `[2, 0]`.

SilentGhost 2010-01-15 12:05:24

@SilentGhost: D'oh. (face, meet palm)

Kimvais 2010-01-15 12:41:33

Answer 4

+1 A:

to use list comprehension:

[i for i in a if not i in b or b.remove(i)]

would do the trick. It would change b in the process though. But I agree with jkp and Dyno Fu that using a for loop would be better.

Perhaps someone can create a better example that uses list comprehension but still is KISS?

BlackShift 2010-01-15 10:19:34

to not destroy b, one can add `c = list(b)` and substitute b for c, but still not as nice as Dyno Fu's answer.

BlackShift 2010-01-15 10:22:34

Answer 5

+1 A:

To prove jkp's point that 'anything on one line will probably be helishly complex to understand', I created a one-liner. Please do not mod me down because I understand this is not a solution that you should actually use. It is just for demonstrational purposes.

The idea is to add the values in a one by one, as long as the total times you have added that value does is smaller than the total number of times this value is in a minus the number of times it is in b:

[ value for counter,value in enumerate(a) if a.count(value) >= b.count(value) + a[counter:].count(value) ]

The horror! But perhaps someone can improve on it? Is it even bug free?

Edit: Seeing Devin Jeanpierre comment about using a dictionary datastructure, I came up with this oneliner:

sum([ [value]*count for value,count in {value:a.count(value)-b.count(value) for value in set(a)}.items() ], [])

Better, but still unreadable.

BlackShift 2010-01-15 10:32:24

Answer 6

+3 A:

Python 2.7 and 3.2 will add the collections.Counter class which is a dictionary that maps elements to the number of occurrences of the element. This can be used as a multiset.

According to the docs you should be able to do something like this (untested, since I do not have either version installed).

from collections import Counter
a = Counter(0,1,2,1)
b = Counter(0,1,1)

print a - b  # ignores items in b missing in a

# check every element in a is in b
# a[key] returns 0 if key not in a, instead of raising an exception
assert all(a[key] > b[key] for key in b)

Edit:

Since you are stuck with 2.5 you could try importing it and define your own version if that fails. That way you will be sure to get the latest version if it is available, and fall back to a working version if not. You will also benefit from speed improvements if if gets converted to a C implementation in the future.

i.e.

try:
   from collections import Counter
except ImportError:
    class Counter(dict):
       ...

You can find the current Python source here.

Dave Kirby 2010-01-15 11:44:40

Unfortunately I'm stuck with 2.5

wich 2010-01-15 12:01:44

It should be `a[key] >= b[key]` instead of `a[key] > b[key]`

J.F. Sebastian 2010-05-08 16:19:36

It should be `Counter([0,1,1])` instead of `Counter(0,1,1)`.

J.F. Sebastian 2010-05-08 16:24:28

Answer 7

+1 A:

I attempted to find a more elegant solution, but the best I could do was basically the same thing that Dyno Fu said:

from copy import copy

def subtract_lists(a, b):
    """
    >>> a = [0, 1, 2, 1, 0]
    >>> b = [0, 1, 1]
    >>> subtract_lists(a, b)
    [2, 0]

    >>> import random
    >>> size = 10000
    >>> a = [random.randrange(100) for _ in range(size)]
    >>> b = [random.randrange(100) for _ in range(size)]
    >>> c = subtract_lists(a, b)
    >>> assert all((x in a) for x in c)
    """
    a = copy(a)
    for x in b:
        if x in a:
            a.remove(x)
    return a

Christian Oudard 2010-01-19 20:34:22

Answer 8

+3 A:

Python 2.7+ and 3.0 have collections.Counter (a.k.a. multiset). The documentation links to Recipe 576611: Counter class for Python 2.5:

from operator import itemgetter
from heapq import nlargest
from itertools import repeat, ifilter

class Counter(dict):
    '''Dict subclass for counting hashable objects.  Sometimes called a bag
    or multiset.  Elements are stored as dictionary keys and their counts
    are stored as dictionary values.

    >>> Counter('zyzygy')
    Counter({'y': 3, 'z': 2, 'g': 1})

    '''

    def __init__(self, iterable=None, **kwds):
        '''Create a new, empty Counter object.  And if given, count elements
        from an input iterable.  Or, initialize the count from another mapping
        of elements to their counts.

        >>> c = Counter()                           # a new, empty counter
        >>> c = Counter('gallahad')                 # a new counter from an iterable
        >>> c = Counter({'a': 4, 'b': 2})           # a new counter from a mapping
        >>> c = Counter(a=4, b=2)                   # a new counter from keyword args

        '''        
        self.update(iterable, **kwds)

    def __missing__(self, key):
        return 0

    def most_common(self, n=None):
        '''List the n most common elements and their counts from the most
        common to the least.  If n is None, then list all element counts.

        >>> Counter('abracadabra').most_common(3)
        [('a', 5), ('r', 2), ('b', 2)]

        '''        
        if n is None:
            return sorted(self.iteritems(), key=itemgetter(1), reverse=True)
        return nlargest(n, self.iteritems(), key=itemgetter(1))

    def elements(self):
        '''Iterator over elements repeating each as many times as its count.

        >>> c = Counter('ABCABC')
        >>> sorted(c.elements())
        ['A', 'A', 'B', 'B', 'C', 'C']

        If an element's count has been set to zero or is a negative number,
        elements() will ignore it.

        '''
        for elem, count in self.iteritems():
            for _ in repeat(None, count):
                yield elem

    # Override dict methods where the meaning changes for Counter objects.

    @classmethod
    def fromkeys(cls, iterable, v=None):
        raise NotImplementedError(
            'Counter.fromkeys() is undefined.  Use Counter(iterable) instead.')

    def update(self, iterable=None, **kwds):
        '''Like dict.update() but add counts instead of replacing them.

        Source can be an iterable, a dictionary, or another Counter instance.

        >>> c = Counter('which')
        >>> c.update('witch')           # add elements from another iterable
        >>> d = Counter('watch')
        >>> c.update(d)                 # add elements from another counter
        >>> c['h']                      # four 'h' in which, witch, and watch
        4

        '''        
        if iterable is not None:
            if hasattr(iterable, 'iteritems'):
                if self:
                    self_get = self.get
                    for elem, count in iterable.iteritems():
                        self[elem] = self_get(elem, 0) + count
                else:
                    dict.update(self, iterable) # fast path when counter is empty
            else:
                self_get = self.get
                for elem in iterable:
                    self[elem] = self_get(elem, 0) + 1
        if kwds:
            self.update(kwds)

    def copy(self):
        'Like dict.copy() but returns a Counter instance instead of a dict.'
        return Counter(self)

    def __delitem__(self, elem):
        'Like dict.__delitem__() but does not raise KeyError for missing values.'
        if elem in self:
            dict.__delitem__(self, elem)

    def __repr__(self):
        if not self:
            return '%s()' % self.__class__.__name__
        items = ', '.join(map('%r: %r'.__mod__, self.most_common()))
        return '%s({%s})' % (self.__class__.__name__, items)

    # Multiset-style mathematical operations discussed in:
    #       Knuth TAOCP Volume II section 4.6.3 exercise 19
    #       and at http://en.wikipedia.org/wiki/Multiset
    #
    # Outputs guaranteed to only include positive counts.
    #
    # To strip negative and zero counts, add-in an empty counter:
    #       c += Counter()

    def __add__(self, other):
        '''Add counts from two counters.

        >>> Counter('abbb') + Counter('bcc')
        Counter({'b': 4, 'c': 2, 'a': 1})


        '''
        if not isinstance(other, Counter):
            return NotImplemented
        result = Counter()
        for elem in set(self) | set(other):
            newcount = self[elem] + other[elem]
            if newcount > 0:
                result[elem] = newcount
        return result

    def __sub__(self, other):
        ''' Subtract count, but keep only results with positive counts.

        >>> Counter('abbbc') - Counter('bccd')
        Counter({'b': 2, 'a': 1})

        '''
        if not isinstance(other, Counter):
            return NotImplemented
        result = Counter()
        for elem in set(self) | set(other):
            newcount = self[elem] - other[elem]
            if newcount > 0:
                result[elem] = newcount
        return result

    def __or__(self, other):
        '''Union is the maximum of value in either of the input counters.

        >>> Counter('abbb') | Counter('bcc')
        Counter({'b': 3, 'c': 2, 'a': 1})

        '''
        if not isinstance(other, Counter):
            return NotImplemented
        _max = max
        result = Counter()
        for elem in set(self) | set(other):
            newcount = _max(self[elem], other[elem])
            if newcount > 0:
                result[elem] = newcount
        return result

    def __and__(self, other):
        ''' Intersection is the minimum of corresponding counts.

        >>> Counter('abbb') & Counter('bcc')
        Counter({'b': 1})

        '''
        if not isinstance(other, Counter):
            return NotImplemented
        _min = min
        result = Counter()
        if len(self) < len(other):
            self, other = other, self
        for elem in ifilter(self.__contains__, other):
            newcount = _min(self[elem], other[elem])
            if newcount > 0:
                result[elem] = newcount
        return result


if __name__ == '__main__':
    import doctest
    print doctest.testmod()

Then you can write

 a = Counter([0,1,2,1,0])
 b = Counter([0, 1, 1])
 c = a - b
 print list(c.elements())  # [0, 2]

ephemient 2010-01-19 20:49:18

I wonder how efficient this it, it hinges of course on the big oh complexity of the dictionary indexing happing inside the Counter class...

wich 2010-01-19 21:31:21

Your solution doesn't throw exception if `b` contains elements that are not in `a`

J.F. Sebastian 2010-05-08 16:07:24

Answer 9

+2 A:

I would do it in an easier way:

a_b = [e for e in a if not e in b ]

pcv 2010-05-08 15:26:36

ansaurus

tags:

views:

answers:

Subtracting two lists in Python

related questions