I would avoid subclassing set
, since clearly you can usefully reuse no part of set
's implementation. I would even avoid subclassing collections.Set
, since the latter requires you to supply a __len__
-- a functionality which you appear not to need otherwise, and just can't be done effectively in the general case (it's going to be O(N)
, with, which the kind of size you're talking about, is far too slow). You're unlikely to find some existing implementation that matches your use case well enough to be worth reusing, because your requirements are very specific and even peculiar -- the concept of "random iterating and an occasional duplicate is OK", for example, is a really unusual one.
If your specs are complete (you only need union, intersection, and random iteration, plus occasional additions and removals of single items), implementing a special purpose class that fills those specs is not a crazy undertaking. If you have more specs that you have not explicitly mentioned, it will be trickier, but it's hard to guess without hearing all the specs. So for example, something like:
import random
class AbSet(object):
def __init__(self, predicate, maxitem=1<<32):
# set of all ints, >=0 and <maxitem, satisfying the predicate
self.maxitem = maxitem
self.predicate = predicate
self.added = set()
self.removed = set()
def copy(self):
x = type(self)(self.predicate, self.maxitem)
x.added = set(self.added)
x.removed = set(self.removed)
return x
def __contains__(self, item):
if item in self.removed: return False
if item in self.added: return True
return (0 <= item < self.maxitem) and self.predicate(item)
def __iter__(self):
# random endless iteration
while True:
x = random.randrange(self.maxitem)
if x in self: yield x
def add(self, item):
if item<0 or item>=self.maxitem: raise ValueError
if item not in self:
self.removed.discard(item)
self.added.add(item)
def discard(self, item):
if item<0 or item>=self.maxitem: raise ValueError
if item in self:
self.removed.add(item)
self.added.discard(item)
def union(self, o):
pred = lambda v: self.predicate(v) or o.predicate(v),
x = type(self)(pred, max(self.maxitem, o.maxitem))
toadd = [v for v in (self.added|o.added) if not pred(v)]
torem = [v for v in (self.removed|o.removed) if pred(v)]
x.added = set(toadd)
x.removed = set(torem)
def intersection(self, o):
pred = lambda v: self.predicate(v) and o.predicate(v),
x = type(self)(pred, min(self.maxitem, o.maxitem))
toadd = [v for v in (self.added&o.added) if not pred(v)]
torem = [v for v in (self.removed&o.removed) if pred(v)]
x.added = set(toadd)
x.removed = set(torem)
I'm not entirely certain about the logic determining added and removed upon union and intersection, but I hope this is a good base for you to work from.