tags:

views:

477

answers:

6

I have to port a C++ STL application to Python. I am a Python newbie, but have been programming for over a decade. I have a great deal of experience with the STL, and find that it keeps me hooked to using C++. I have been searching for the following items on Google these past several days:

  1. Python STL (in hope of leveraging my years of STL experience)
  2. Python linked lists
  3. Python advanced list usage
  4. Python list optimization
  5. Python ordered sets

And have found posts about the above topic, tutorials on Python lists that are decidedly NOT advanced, or dead ends. I am really surprised at my lack of success, I think I am just burned out from overworking and entering bad search terms!

(MY QUESTION) Can I get a Python STL wrapper, or an interface to Python lists that works like the STL? If not can someone point me to a truly advanced tutorial or paper on managing very large sorted collections of non trivial objects?

P.S. I can easily implement workarounds for one or two uses, but if management wants to port more code, I want to be ready to replace any STL code I find with equivalent Python code immediately. And YES I HAVE MEASURED AND DO NEED TO HAVE TOTALLY OPTIMAL CODE! I CANT JUST DO REDUNDANT SORTS AND SEARCHES!

(ADDENDUM) Thanks for the replies, I have checked out some of the references and am pleased. In response to some of the comments here:

1 - It is being ported to python because managements says so, I would just as soon leave it alone - if it aint broke, why fix it?

2 - Advanced list usage with non trivial objects, what I mean by that is: Many different ways to order and compare objects, not by one cmp method. I want to splice, sort, merge, search, insert, erase, and combine the lists extensively. I want lists of list iterators, I want to avoid copying.

3 - I now know that built in lists are actually arrays, and I should be looking for a different python class. I think this was the root of my confusion.

4 - Of course I am learning to do things in the Python way, but I also have deadlines. The STL code I am porting is working right, I would like to change it as little as possible, because that would introduce bugs.

Thanks to everyone for their input, I really appreciate it.

+13  A: 

If I were you I would take the time to learn how to properly use the various data structures available in Python instead of looking for things that are similar to what you know from C++.

It's not like you're looking for something fancy, just working with some data structures. In that case I would refer you to Python's documentation on the subject.

Doing this the 'Python' way would help you and more importantly future maintainers who will wonder why you try to program C++ in Python.

Just to whet your appetite, there's also no reason to prefer STL's style to Python (and for the record, I'm also a C++ programmer who knows STL throughly), consider the most trivial example of constructing a list and traversing it:

The Pythonic way:

mylist = [1, 2, 3, 4]

for value in mylist:
    # playaround with value

The STL way (I made this up, to resemble STL) in Python:

mylist = [1, 2, 3, 4]
mylistiter = mylist.begin()

while mylistiter != mylist.end():
    value = mylistiter.item()
    mylistiter.next()
Idan K
Wouldn't this be better as "for(mylistiter=mylist.begin(); mylistiter != mylist.end(); ++mylistiter)" ? Whenever you use a while loop, there is always the risk of omitting or skipping the increment statement.
Paul McGuire
Your for statement isn't valid Python. The whole point of my example was to demonstrate how STL-style iteration might look in Python.
Idan K
The STL way is to not explicitly iterate.
camh
Sorry, I misread what you were trying to show.
Paul McGuire
I appreciate what you are saying, but I have to deal with deadlines, and the safest way to port the code is to stick with STL idioms, as that will not introduce new bugs. I am a real Python newb, and when I have more experience with it I will gladly start doing things the Python Way(TM). Until then, I don't care about being a bad Python programmer, I just want to deliver functioning, stable, fast code, on time.
@Dustin,camh - the code I made up was merely to illustrate a point, I didn't want to complicate things with an example of for_each and lambdas. Besides, there are times when writing a functor for a simple for-loop is overkill.
Idan K
unknown: If you want functioning, stable, fast, and on time, then why are you listening to someone's arbitrary command to port to Python for unstated (but implied as irrelevant) reasons?
Roger Pate
+2  A: 

For linked-list-like operations people usually use collections.deque.

What operations do you need to perform fast? Bisection? Insertion?

liori
+1  A: 

Python STL (in hope of leveraging my years of STL experience) - Start with the collections ABC's to learn what Python has. http://docs.python.org/library/collections.html

Python linked lists. Python lists have all the features you would want from a linked list.

Python advanced list usage. What does this mean?

Python list optimization. What does this mean?

Python ordered sets. You have several choices here; you could invent your own "ordered set" as a list that discards duplicates. You can subclass the heapq and add methods that discard duplicates: http://docs.python.org/library/heapq.html.

In many cases, however, the cost of maintaing an ordered set is actually excessive because it must only be ordered once at the end of the algorithm. In other cases, the "ordered set" really is a heapq -- you never needed the set-like features and only needed the ordering.

Non-Trivial. (I'm guessing at what you meant by "non-trivial"). All Python objects are equivalent. There's no "trivial" vs. "non-trivial" objects. They're all first-class objects and can all have "non-trivial" complexity without any real work. This is not C++ where there are primitive (non-object) values floating around. Everything's an object in Python.

Management Expectations. For the most part the C++ brain-cramping doesn't exist in Python. Use the obvious Python classes the obvious way and you'll have much less code. The reduction in code volume is the big win. Often, the management reason for converting C++ to Python is to get rid of the C++ complexity.

Python code will be much simpler, making it much more reliable and much easier to maintain.

While it's generally true that Python is slower than C++, it's also true that picking the right algorithm and data structure can have dramatic improvements on performance. In one benchmark, someone found that Python was actually faster than C because the C program had such a poorly chosen data structure.

It's possible that your C++ has a really poor algorithm and you will see comparable performance from Python.

It's also possible that your C++ program is I/O bound, or has other limitations that will leave the Python running at a comparable speed.

S.Lott
Overall I agree with you, but it is true that C++ does have some well tuned containers and algorithms. You have more opportunity of making the best choice. That said, if that's Unknown's motivation in this then I really think the move to Python is a mistake.
Phil Nash
Python lists do not have all the features you might want from a linked list. Inserting an element in the middle of a Python list is O(*n*).
Jason Orendorff
And C++, like Python, treats the built-in types and user-defined types very much the same. Everything's an object in C++ too.
Jason Orendorff
@Jason Orendorff: C++ has non-object types. int, long, double, char are not objects.
S.Lott
Thanks, but the C++ program works great, thats just the point. If I can make a close translation then I know the Python code will also work great. If I try to do things the python way(TM) then I will likely introduce bugs, especially because I started learning python last week!
My experience in translation from platform to platform (only 30 years, you may have more) is that your approach cannot really work. A close transliteration from one language to the other can almost never be made to work. All you're doing is capturing the limitations of C++ in Python. For example, C++ requires memory management (i.e., DELETEs); Python does not even have this concept.
S.Lott
+1  A: 

The design of Python is quite intentionally "you can use just a few data structures (arrays and hash tables) for whatever you want to do, and if that isn't fast enough there's always C".

Python's standard library doesn't have a sorted-list data structure like std::set. You can download a red/black tree implementation or roll your own. (For small data sets, just using a list and periodically sorting it is a totally normal thing to do in Python.)

Rolling your own linked list is very easy.

Jason Orendorff
+10  A: 

Python's "lists" are not linked lists -- they're like Java ArrayLists or C++'s std::vectors, i.e., in lower-level terms, a resizable compact array of pointers.

A good "advanced tutorial" on such subjects is Hettinger's Core Python containers: under the hood presentation (the video at the URL is of the presentation at an Italian conference, but it's in English; another, shorter presentation of essentially the same talk is here).

So the performance characteristics of Python lists are essentially those of C++'s std::vector: Python's .append, like C++'s push_back, is O(1), but insertion or removal "in the middle" is O(N). Consequently, keeping a list sorted (as can be easily done with the help of functions in Python's standard library module bisect) is costly (if items arrive and/or depart randomly, each insertion and removal is O(N), just like similarly maintaining order in an std::vector would be. For some purposes, such as priority queues, you may get away with a "heap queue", also easy to maintain with the help of functions in Python's standard library module heapq -- but of course that doesn't afford the same range of uses as a completely sorted list (or vector) would.

So for purposes for which in C++ you'd use a std::set (and rely on its being ordered, i.e., a hashset wouldn't do -- Python's sets are hash-based, not ordered) you may be better off avoiding Python builtin containers in favor of something like this module (if you need to keep things pure-Python), or this one (which offers AVL trees, not RB ones, but is coded as a C-implemented Python extension and so may offer better performance) if C-coded extensions are OK.

If you do end up using your own module (be it pure Python, or C-coded), you may, if you wish, give it an STL-like veneer/interface (with .begin, .end, iterator objects that are advanced by incrementing rather than, as per normal Python behavior, by calling their next methods, ...), although it will never perform as well as "going with the grain" of the language would (the for statement is optimized to use normal Python iterators, i.e., one with next methods, and it will be faster than wrapping a somewhat awkward while around non-Python-standard, STL-like iterators).

To give an STL-like veneer to any Python built-in container, you'll incur substantial wrapping overhead, so the performance hit may be considerable. If you, as you say, "DO NEED TO HAVE TOTALLY OPTIMAL CODE", using such a veneer just for "syntax convenience" purposes would therefore seem to be a very bad choice.

Boost Python, the Python extension package that wraps the powerful C++ Boost library, might perhaps serve your purposes best.

Alex Martelli
Thanks for the useful info, I found the above links very helpful. The reason I want to apply a STL wrapper is to maintain the codes stability, ie; not introduce bugs.
So I hope you can make good use of Boost Python, as otherwise I believe using your desired interface and still get top performance may be incompatible goals.
Alex Martelli
+2  A: 

I would say that your issues go beyond just STL porting. Since the list, dict, and set data structures, which are bolted on to C++ via the STL, are native to core Python, then their usage is incorporated into common Python code idioms. If you want to give Google another shot, try looking for references for "Python for C++ Programmers". One of your hits will be this presentation by Alex Martelli. It's a little dated, from way back in ought-three, but there is a side-by-side comparison of some basic Python code that reads through a text file, and how it would look using STL.

From there, I would recommend that you read up on these Python features:

  • iterators
  • generators
  • list and generator comprehensions

And these builtin functions:

  • zip
  • map

Once you are familiar with these, then you will be able to construct your own translation/mapping between STL usage and Python builtin data structures.

As others have said, if you are looking for a "plug-and-chug" formula to convert STL C++ code to Python, you will just end up with bad Python. Such a brute force approach will never result in the power, elegance, and brevity of a single-line list comprehension. (I had this very experience when introducing Python to one of our managers, who was familiar with Java and C++ iterators. When I showed him this code:

numParams = 1000
paramRequests = [ ("EqptEmulator/ProcChamberI/Sensors", 
                   "ChamberIData%d"%(i%250)) for i in range(numParams) ]
record.internalArray = [ParameterRequest(*pr) for pr in paramRequests]

and I explained that these replaced this code (or something like it, this might be a mishmash of C++ and Java APIs, sorry):

std::vector<ParameterRequest> prs = new std::vector<ParameterRequest>();
for (int i = 0; i<1000; ++i) {
    string idstr;
    strstream sstr(idstr);
    sstr << "ChamberIData" << (i%250);
    prs.add(new ParameterRequest("EqptEmulator/ProcChamberI/Sensors", idstr));
}
record.internalArray = new ParameterRequest[prs.size];
prs.toArray(record.internalArray);

One of your instincts from working with C++ will be a reluctance to create new lists from old, but rather to update or filter a list in place. We even see this on many forums from Python developers asking about how to modify a list while iterating over it. In Python, you are much better off building a new list from the old with a list comprehension.

allItems = [... some list of items, perhaps from a database query ...]
validItems = [it for it in allItems if it.isValid()]

As opposed to:

validItems = []
for it in allItems:
    if it.isValid():
        validItems.add(it)

or worse:

# get list of indexes of items to be removed
removeIndexes = []
for i in range(len(allItems)):
    if not allItems[i].isValid():
        removeIndexes.add(i)

# don't forget to remove items in descending order, or later indexes
# will be invalidated by earlier removals
sort(removeIndexes,reverse=True)

# copy list
validItems = allItems[:]

# now remove the items from allItems
for idx in removeIndexes:
    del validItems[i]
Paul McGuire
Thanks for the response, I understand yours and others objections to my approach, but I am doing my best in the face of stubborn management. They said, "hey you, port this to python!" I said, "I don't really know python.", they replied "learn it and port this before february". So I really do wan't a plug and chug replacement for the STL, as that is the safest route for me to port the code without adding bugs. In future versions, after gaining a few months of python experience, I may begin replacing the STL way with the python way(TM), but I don't feel competent to do that yet.