views:

1788

answers:

6

This is what I normally do in order to ascertain that the input is a list/tuple - but not a str. Because many times I stumbled upon bugs where a function passes a str object by mistake, and the target function does for x in lst assuming that lst is actually a list or tuple.

assert isinstance(lst, (list, tuple))

My question is: is there a better way of achieving this?

+8  A: 

I think

assert not isinstance(lst, basestring)

Is actually what you want, otherwise you'll miss out on a lot of things which act like lists, but aren't subclasses of list or tuple.

Nick Craig-Wood
Yes, this is the correct answer. In Python 3, `basestring` is gone, and you just check for `isinstance(lst, str)`.
steveha
What other types act like lists other than `list` and `tuple`?
Sridhar Ratnakumar
There are lots of things you can iterate like lists, eg `set`, generator expressions, iterators. There are exotic things like `mmap`, less exotic things like `array` which act pretty much like lists, and probably lots more I've forgotten.
Nick Craig-Wood
My only objection to this answer is that: what if - in future - there is yet another sequence type that would pass the "for x in seq" syntax?
Sridhar Ratnakumar
A: 

Also check out the types module:

import types

assert isinstance(x, types.ListType)
assert not isinstance(x, types.StringTypes)

and so on...

Eli Bendersky
+1  A: 

The str object doesn't have an __iter__ attribute

>>> hasattr('', '__iter__')
False

so you can do a check

assert hasattr(x, '__iter__')

and this will also raise a nice AssertionError for any other non-iterable object too.

Edit: As Tim mentions in the comments, this will only work in python 2.x, not 3.x

Moe
Careful: In Python 3 `hasattr('','__iter__')` returns `True`. And of course that makes sense since you can iterate over a string.
Tim Pietzcker
Really? I didn't know that. I always thought this was an elegant solution to the problem, oh well.
Moe
This test didn't work on pyodbc.Row. It has no __iter__() but it more or less behaves like a list (it even defines "__setitem__"). You can iterate its elements just fine. The len() function works and you can index its elements. I am struggling to find the right combination that catches all list types, but excludes strings. I think I will settle for a check on "__getitem__" and "__len__" while explicitly excluding basestring.
haridsv
+3  A: 

Remember that in Python we want to use "duck typing". So, anything that acts like a list can be treated as a list. So, don't check for the type of a list, just see if it acts like a list.

But strings act like a list too, and often that is not what we want. There are times when it is even a problem! So, check explicitly for a string, but then use duck typing.

Here is a function I wrote for fun. It is a special version of repr() that prints any sequence in angle brackets ('<', '>').

def srepr(arg):
    if isinstance(arg, basestring):
        return repr(arg)
    try:
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    except TypeError:
        return repr(arg)

This is clean and elegant, overall. But what's that isinstance() check doing there? That's kind of a hack. But it is essential.

This function calls itself recursively on anything that acts like a list. If we didn't handle the string specially, then it would be treated like a list, and split up one character at a time. But then the recursive call would try to treat each character as a list -- and it would work! Even a one-character string works as a list! The function would keep on calling itself recursively until stack overflow.

Functions like this one, that depend on each recursive call breaking down the work to be done, have to special-case strings--because you can't break down a string below the level of a one-character string, and even a one-character string acts like a list.

Note: the try/except is the cleanest way to express our intentions. But if this code were somehow time-critical, we might want to replace it with some sort of test to see if arg is a sequence. Rather than testing the type, we should probably test behaviors. If it has a .index() method, it's a string, so don't consider it a sequence; otherwise, if it is indexable or iterable, it's a sequence:

def is_sequence(arg):
    return (not hasattr(arg, "strip") and
            hasattr(arg, "__getitem__") or
            hasattr(arg, "__iter__"))

def srepr(arg):
    if is_sequence(arg):
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    return repr(arg)

EDIT: I originally wrote the above with a check for __getslice__() but I noticed that in the collections module documentation, the interesting method is __getitem__(); this makes sense, that's how you index an object. That seems more fundamental than __getslice__() so I changed the above.

steveha
A: 

I tend to do this (if I really, really had to):

for i in some_var:
   if type(i) == type(list()):
       #do something with a list
   elif type(i) == type(tuple()):
       #do something with a tuple
   elif type(i) == type(str()):
       #here's your string
DrBloodmoney
You almost never should do this. What happens if I `some_var` is an instance of a class that is a subclass of `list()`? Your code won't have any idea what to do with it, even though it will work perfectly under the "do something with a list" code. And you rarely need to care about the difference between a list and a tuple. Sorry, -1.
steveha
+1  A: 

Generally speaking, the fact that a function which iterates over an object works on strings as well as tuples and lists is more feature than bug. You certainly can use isinstance or duck typing to check an argument, but why should you?

That sounds like a rhetorical question, but it isn't. The answer to "why should I check the argument's type?" is probably going to suggest a solution to the real problem, not the perceived problem. Why is it a bug when a string is passed to the function? Also: if it's a bug when a string is passed to this function, is it also a bug if some other non-list/tuple iterable is passed to it? Why, or why not?

I think that the most common answer to the question is likely to be that developers who write f("abc") are expecting the function to behave as though they'd written f(["abc"]). There are probably circumstances where it makes more sense to protect developers from themselves than it does to support the use case of iterating across the characters in a string. But I'd think long and hard about it first.

Robert Rossney