views:

299

answers:

5

Basically, I have a list like: [START, 'foo', 'bar', 'spam', eggs', END] and the START/END identifiers are necessary for later so I can compare later on. Right now, I have it set up like this:

START = object()
END = object()

This works fine, but it suffers from the problem of not working with pickling. I tried doing it the following way, but it seems like a terrible method of accomplishing this:

class START(object):pass
class END(object):pass

Could anybody share a better means of doing this? Also, the example I have set up above is just an oversimplification of a different problem.

A: 

I think maybe this would be easier to answer if you were more explicit about what you need this for, but my inclination if faced with a problem like this would be something like:

>>> START = os.urandom(16).encode('hex')
>>> END = os.urandom(16).encode('hex')

Pros of this approach, as I'm seeing it

  • Your markers are strings (can pickle or otherwise easily serialize, eg to JSON or a DB, without any special effort)
  • Very unlikely to collide either accidentally or on purpose
  • Will serialize and deserialize to identical values, even across process restarts, which (I think) would not be the case for object() or an empty class.

Cons(?)

  • Each time they are newly chosen they will be completely different. (This being good or bad depends on details you have not provided, I would think).
Jack Lloyd
Sorry, but this doesn't work if I pickle the result and open it up in a separate process since now each one is different.
Evan Fosmark
I *really* don't like this solution (I know the chance of collision is small, but if it happens you are really stuffed!), but you did list the limitations, so I am resisting my urge to downvote
Casebash
@Casebash Does 1/2^64 seem an unreasonably high probability to you, or do you have reason to believe /dev/*random is broken?
Jack Lloyd
+1  A: 

If your list didn't have strings, I'd just use "start", "end" as Python makes the comparison O(1) due to interning.

If you do need strings, but not tuples, the complete cheapskate method is:

[("START",), 'foo', 'bar', 'spam', eggs', ("END",)]

Ps. I was sure your list was numbers before, not strings, but I can't see any revisions so I must have imagined it

Casebash
+6  A: 

If you want an object that's guaranteed to be unique and can also be guaranteed to get restored to exactly the same identify if pickled and unpickled right back, top-level functions, classes, class instances, and if you care about is rather than == also lists (and other mutables), are all fine. I.e., any of:

# work for == as well as is
class START(object): pass
def START(): pass
class Whatever(object): pass
START = Whatever()

# if you don't care for "accidental" == and only check with `is`
START = []
START = {}
START = set()

None of these is terrible, none has any special advantage (depending if you care about == or just is). Probably def wins by dint of generality, conciseness, and lighter weight.

Alex Martelli
+1  A: 

Actually, I like your solution.

A while back I was hacking on a Python module, and I wanted to have a special magical value that could not appear anywhere else. I spent some time thinking about it and the best I came up with is the same trick you used: declare a class, and use the class object as the special magical value.

When you are checking for the sentinel, you should of course use the is operator, for object identity:

for x in my_list:
    if x is START:
        # handle start of list
    elif x is END:
        # handle end of list
    else:
        # handle item from list
steveha
If you declare your own class == will only check identity by default
Casebash
True, but I wouldn't recommend it. If you write `==` when you really mean `is`, you aren't clearly expressing your intent. And I'm not sure how likely it is, but it is possible someone could write a class whose `__cmp__()` method returns 0 when it compares to your sentinel class. (I just wrote one as a proof of concept...) If you use `is` you know exactly what it will do.
steveha
+2  A: 

You can define a Symbol class for handling START and END.

class Symbol:
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        return isinstance(other, Symbol) and other.value == self.value

    def __repr__(self):
        return "<sym: %r>" % self.value

    def __str__(self):
        return str(self.value)

START = Symbol("START")
END = Symbol("END")

# test pickle
import pickle
assert START == pickle.loads(pickle.dumps(START))
assert END == pickle.loads(pickle.dumps(END))
Anand Chitipothu
I was actually doing it this way for a while.
Evan Fosmark
Then why do you want to switch to some other approach? In fact some other languages have built-in support for symbols. In Ruby, you can write `:start` and `:end` for start and stop symbols.
Anand Chitipothu
In Python, string are intended to be used as symbols. Except sometimes we also want to use strings, so we need these other approaches
Casebash