views:

93

answers:

3

Sets and lists are handled differently in Python, and there seems to be no uniform way to work with both. For example, adding an item to a set is done using the add method, and for the list it is done using the append method. I am aware that there are different semantics behind this, but there are also common semantics there, and often an algorithm that works with some collection cares more about the commonalities than the differences. The C++ STL shows that this can work, so why is there no such concept in Python?

Edit: In C++ I can use an output_iterator to store values in an (almost) arbitrary type of collection, including lists and sets. I can write an algorithm that takes such an iterator as argument and writes elements to it. The algorithm then is completely agnostic to the kind of container (or other device, may be a file) that backs the iterator. If the backing container is a set that ignores duplicates, then that is the decision of the caller. My specific problem is, that it has happened several times to me now that I used for instance a list for a certain task and later decided that set is more appropriate. Now I have to change the append to add in several places in my code. I am just wondering why Pyhton has no concept for such cases.

+4  A: 

add and append are different. Sets are unordered and contain unique elements, while append suggest the item is always added, and that this is done specifically at the end.

sets and lists can both be treated as iterables, and that's their common semantics, and that's freely usable by your algorithms.

If you have an algorithm that depends on some sort of addition, you simply can't depend on sets, tuples, lists, dicts, strings behaving the same.

Ivo van der Wijk
+1: `add` to a set may not have any effect. `append` to a list always has an effect. And lists have `extend` -- what does that mean for a set? `union` maybe? The semantics are utterly different.
S.Lott
The same is true for sets in C++, yet they have a unifying concept of adding elements (see edit of my question).
Space_C0wb0y
STL offers *many* container types it seems. I can imagine you want some simplifcation there. As far as I can tell, your question is only relevant for lists and sets when it comes to python. If you really need this, you can always write your own wrapper that wraps lists and sets and behaves the way you want it to.
Ivo van der Wijk
+4  A: 

The direct answer: it's a design flaw.

You should be able to insert into any container where generic insertion makes sense (eg. excluding dict) with the same method name. There should be a consistent, generic name for insertion, eg. add, corresponding to set.add and list.append, so you can add to a container without having to care as much about what you're inserting into.

Using different names for this operation in different types is a gratuitous inconsistency, and sets a poor base standard: the library should encourage user containers to use a consistent API, rather than providing largely incompatible APIs for each basic container.

That said, it's not often a practical problem in this case: most of the time where a function's results are a list of items, implement it as a generator. They allow handling both of these consistently (from the perspective of the function), as well as other forms of iteration:

def foo():
    yield 1
    yield 2
    yield 3

s = set(foo())
l = list(foo())
results1 = [i*2 for i in foo()]
results2 = (i*2 for i in foo())
for r in foo():
    print r
Glenn Maynard
+1 for the generator.
Space_C0wb0y
Well, Python wants to be explicit, so just try to come up with a verb that says "preserves order" (for lists) and "does not preserve order" (for sets), at the same time! Pretty impossible ;-P But as you said, there actually is a uniform way to construct sets and lists and it's much more pythonic than calling `add` or `append`. +1 for that.
THC4k
Perhaps it's not so much a design flaw as a different way of thinking. In the C++ standard library a function that produces values accepts an output iterator. In Python a function that produces values is made into a generator. If you want your Python to act more like the C++ standard library you can create an OutputIterator class that accepts various container types in it's constructor and figures out which method to call on each container. You would use it like this:my_func(iter, OutputIterator(my_container))
Steven Rumbalski
@THC4k: I don't agree that "insert into this object" functions need to specify where they insert, so long as they don't imply behavior that isn't there; clearly sets should not have an `append` method. Another example of this sort of inconsistency: I should be able to drop any queue container into a class, using a simple queue or the builtin priority queue (heapq). Yet, the methods in heapq are completely ungeneric: `heappush` and `heappop`. These should be simply `push` and `pop`, allowing replacing the queue without caring about how it orders objects.
Glenn Maynard
This is all in line with a fairly fundamental concept used in Python: duck typing. In general, you should be able to operate on objects without specifically caring what they are; if you're given a container you don't care--unless you have a specific reason to--whether it's a list or set or user-provided linked list, binary tree, BSP-tree or anything else. Using different names for these methods unnecessarily breaks from this design philosophy and forces you to care about what you're working with.
Glenn Maynard
+1  A: 

The actual reason is probably just related to Python history.

The built-in set type wasn't built-in until Python 2.6, and was based on a sets module, which itself wasn't in the standard library until Python 2.3. Obviously changing the semantics of the set type could break a host of existing code that relied on the original sets module, and generally language designers shy away from breaking existing code without a major number release.

You can blame the original module author if you like, but keep in mind that user-defined types and built-in types necessarily lived in different universes until Python 2.2, which meant you couldn't directly extend a built-in type, and probably allowed module authors to feel OK about not maintaining consistent collection semantics.

Triptych