tags:

views:

400

answers:

1
+1  Q: 

Subclassing list

Hello!

I want create a DataSet class which is basically a list of samples. But I need to override each insertion operation to the DataSet.

Is there any simple way to do this without writing my own append, extend, iadd etc. ?

UPDATE: I want to add a backpointer to each sample, holding index of the sample in the DataSet. This is needed to the processing algorithm I use. I have a solution, but it seems unelegant - a renumber() function -- it ensures that the backpointers are valid.

+3  A: 

I don't know of a way of doing what you're asking -- overriding mutators without overriding them. With a class decorator, however, you can "automate" the overriding versions (assuming each can be achieved by wrapping the corresponding method in the base class), so it's not too bad...

Suppose for example that what you want to do is add a "modified" flag, true if the data may have been changed since the last call to .save (a method of yours which persist the data and sets self.modified to False).

Then...:

def wrapMethod(cls, n):
    f = getattr(cls, n)
    def wrap(self, *a):
      self.dirty = True
      return f(self, *a)
    return wrap

def wrapListMutators(cls):
  for n in '''__setitem__ __delitem__ __iadd__ __imul__
              append extend insert pop remove reverse sort'''.split():
    f = wrapMethod(cls, n)
    setattr(cls, n, f)
  return cls

@wrapListMutators
class DataSet(list):
  dirty = False
  def save(self): self.dirty = False

This syntax requires Python 2.6 or better, but, in earlier Python versions (ones which only support decorators on def statements, not on class statements; or even very old ones that don't support decorators at all), you just need to change the very last part (the class statement), to:

class DataSet(list):
  dirty = False
  def save(self): self.dirty = False
DataSet = wrapListMutators(DataSet)

IOW, the neat decorator syntax is just a small amount of syntax sugar on top of a normal function call which takes the class as the argument and reassigns it.

Edit: now that you have edited your question to clarify your exact requirements -- maintain on each item a field bp such that, for all i, theset[i].bp == i -- it's easier to weigh the pro and con of various approaches.

You could adapt the approach I sketched, but instead of self.dirty assignment before the call to the wrapped method, have a self.renumber() call after it, i.e.:

def wrapMethod(cls, n):
    f = getattr(cls, n)
    def wrap(self, *a):
      temp = f(self, *a)
      self.renumber()
      return temp
    return wrap

this meets your stated requirements, but in many cases it will do far more work than necessary: for example, when you append an item, this needlessly "renumbers" all existing ones (to the same values they already had). But how could any fully automated approach "know" which items, if any, it must recompute the .bp of, without O(N) effort? At least it must look at each and every one of them (since you don't want to separately code, e.g., append vs insert &c), and that's already O(N).

So this will be acceptable only if it's OK for every single change to the list to be O(N) (basically only if the list always stays small and/or doesn't change often).

A more fruitful idea might be to not maintain .bp values all the time, but only "just in time" when needed. Make bp a (read-only) property, calling a method which checks if the container is "dirty" (where the "dirty" flag in the container is maintained using the automated code I've already given) and only then renumbers the container (and sets its "dirty" attribute to False).

This will work well when the list typically is subject to a burst of changes and only then do you need to access the items' bp for a while, then another bunch of changes, etc. Such bursty alternation between changing and reading is not rare in real-world containers, but only you can know whether it applies in your specific case!

To get performance beyond this I think you need to do some manual coding on top of this general approach to take advantage of frequent special cases. For example, append may be called very often, and the amount of work to do in a special-cases append is really small, so it may well be worth your while to write those two or three lines of code (not setting the dirty bit for that case).

One caveat: no approach will work (indeed your requirement becomes self contradictory) if any item is present twice in the list -- which of course is perfectly possible unless you take precautions to avoid it (you could easily diagnose it in renumber -- by keeping a set of elements already seen and raising an exception on any duplication -- if that's not too late for you; it's harder to diagnose "on the fly", i.e. at the time of a mutation that causes a duplicate, if that's what you require). Maybe you can relax your requirement so that, if an item is present twice, that's OK and the bp can just indicate one of the indices; or make bp into a set of indices where the element is present (that would also offer a smooth approach to the case of getting bp from an element that's not in the list). Etc, etc; I recommend you consider (and document!) all of these corner cases in depth -- correctness before performance!

Alex Martelli
Are you sure, Alex? The docs for UserList say: "Note: This module is available for backward compatibility only. If you are writing code that does not need to work with versions of Python earlier than Python 2.2, please consider subclassing directly from the built-in list type." http://www.python.org/doc/2.5.2/lib/module-UserList.html
Jarret Hardie
@Janet, yep, I had mis-spoken, so I edited the answer, thanks.
Alex Martelli
+1 for the revised version
Jarret Hardie
Thanks for the idea!
Anton Kazennikov
And for the example! :)
Anton Kazennikov
You're welcome! I've just added a lot of discussion of your specific case so I recommend you re-read the answer in its current state (tx for accepting it even in the previous incomplete form tho;-). If much more discussion or code is needed I recommend you open a new question tho, as this is already by far the longest SO answer I've ever given (tx for posing such an interesting problem btw!-).
Alex Martelli
+1 for detailed answer and "correctness before performance" :)
NicDumZ
And thanks again for the elaborate answer and discussion, as well for some more ideas!
Anton Kazennikov