ansaurus

Question

Why are collections not handled uniformly in Python?

Answer 1

+4 A:

add and append are different. Sets are unordered and contain unique elements, while append suggest the item is always added, and that this is done specifically at the end.

sets and lists can both be treated as iterables, and that's their common semantics, and that's freely usable by your algorithms.

If you have an algorithm that depends on some sort of addition, you simply can't depend on sets, tuples, lists, dicts, strings behaving the same.

Ivo van der Wijk 2010-09-14 09:13:12

+1: `add` to a set may not have any effect. `append` to a list always has an effect. And lists have `extend` -- what does that mean for a set? `union` maybe? The semantics are utterly different.

S.Lott 2010-09-14 10:28:25

The same is true for sets in C++, yet they have a unifying concept of adding elements (see edit of my question).

Space_C0wb0y 2010-09-14 10:54:10

STL offers *many* container types it seems. I can imagine you want some simplifcation there. As far as I can tell, your question is only relevant for lists and sets when it comes to python. If you really need this, you can always write your own wrapper that wraps lists and sets and behaves the way you want it to.

Ivo van der Wijk 2010-09-14 11:23:26

Answer 2

+4 A:

The direct answer: it's a design flaw.

You should be able to insert into any container where generic insertion makes sense (eg. excluding dict) with the same method name. There should be a consistent, generic name for insertion, eg. add, corresponding to set.add and list.append, so you can add to a container without having to care as much about what you're inserting into.

Using different names for this operation in different types is a gratuitous inconsistency, and sets a poor base standard: the library should encourage user containers to use a consistent API, rather than providing largely incompatible APIs for each basic container.

That said, it's not often a practical problem in this case: most of the time where a function's results are a list of items, implement it as a generator. They allow handling both of these consistently (from the perspective of the function), as well as other forms of iteration:

def foo():
    yield 1
    yield 2
    yield 3

s = set(foo())
l = list(foo())
results1 = [i*2 for i in foo()]
results2 = (i*2 for i in foo())
for r in foo():
    print r

Glenn Maynard 2010-09-14 09:18:07

+1 for the generator.

Space_C0wb0y 2010-09-14 10:54:55

Well, Python wants to be explicit, so just try to come up with a verb that says "preserves order" (for lists) and "does not preserve order" (for sets), at the same time! Pretty impossible ;-P But as you said, there actually is a uniform way to construct sets and lists and it's much more pythonic than calling `add` or `append`. +1 for that.

THC4k 2010-09-14 13:13:32

Perhaps it's not so much a design flaw as a different way of thinking. In the C++ standard library a function that produces values accepts an output iterator. In Python a function that produces values is made into a generator. If you want your Python to act more like the C++ standard library you can create an OutputIterator class that accepts various container types in it's constructor and figures out which method to call on each container. You would use it like this:my_func(iter, OutputIterator(my_container))

Steven Rumbalski 2010-09-14 18:00:43

@THC4k: I don't agree that "insert into this object" functions need to specify where they insert, so long as they don't imply behavior that isn't there; clearly sets should not have an `append` method. Another example of this sort of inconsistency: I should be able to drop any queue container into a class, using a simple queue or the builtin priority queue (heapq). Yet, the methods in heapq are completely ungeneric: `heappush` and `heappop`. These should be simply `push` and `pop`, allowing replacing the queue without caring about how it orders objects.

Glenn Maynard 2010-09-14 20:01:13

This is all in line with a fairly fundamental concept used in Python: duck typing. In general, you should be able to operate on objects without specifically caring what they are; if you're given a container you don't care--unless you have a specific reason to--whether it's a list or set or user-provided linked list, binary tree, BSP-tree or anything else. Using different names for these methods unnecessarily breaks from this design philosophy and forces you to care about what you're working with.

Glenn Maynard 2010-09-14 20:01:50

Answer 3

+1 A:

The actual reason is probably just related to Python history.

The built-in set type wasn't built-in until Python 2.6, and was based on a sets module, which itself wasn't in the standard library until Python 2.3. Obviously changing the semantics of the set type could break a host of existing code that relied on the original sets module, and generally language designers shy away from breaking existing code without a major number release.

You can blame the original module author if you like, but keep in mind that user-defined types and built-in types necessarily lived in different universes until Python 2.2, which meant you couldn't directly extend a built-in type, and probably allowed module authors to feel OK about not maintaining consistent collection semantics.

Triptych 2010-09-14 10:43:04

ansaurus

tags:

views:

answers:

Why are collections not handled uniformly in Python?

related questions