views:

449

answers:

5

Suppose I've developed a general-purpose end user utility written in Python. Previously, I had just one version available which was suitable for Python later than version 2.3 or so. It was sufficient to say, "download Python if you need to, then run this script". There was just one version of the script in source control (I'm using Git) to keep track of.

With Python 3, this is no longer necessarily true. For the foreseeable future, I will need to simultaneously develop two different versions, one suitable for Python 2.x and one suitable for Python 3.x. From a development perspective, I can think of a few options:

  1. Maintain two different scripts in the same branch, making improvements to both simultaneously.
  2. Maintain two separate branches, and merge common changes back and forth as development proceeds.
  3. Maintain just one version of the script, plus check in a patch file that converts the script from one version to the other. When enough changes have been made that the patch no longer applies cleanly, resolve the conflicts and create a new patch.

I am currently leaning toward option 3, as the first two would involve a lot of error-prone tedium. But option 3 seems messy and my source control system is supposed to be managing patches for me.

For distribution packaging, there are more options to choose from:

  1. Offer two different download packages, one suitable for Python 2 and one suitable for Python 3 (the user will have to know to download the correct one for whatever version of Python they have).
  2. Offer one download package, with two different scripts inside (and then the user has to know to run the correct one).
  3. One download package with two version-specific scripts, and a small stub loader that can run in both Python versions, that runs the correct script for the Python version installed.

Again I am currently leaning toward option 3 here, although I haven't tried to develop such a stub loader yet.

Any other ideas?

+2  A: 

For developement, option 3 is too cumbersome. Maintaining two branches is the easiest way although the way to do that will vary between VCSes. Many DVCS will be happier with separate repos (with a common ancestry to help merging) and centralized VCS will probably easier to work with with two branches. Option 1 is possible but you may miss something to merge and a bit more error-prone IMO.

For distribution, I'd use option 3 as well if possible. All 3 options are valid anyway and I have seen variations on these models from times to times.

Keltia
A: 

Whichever option for development is chosen, most potential issues could be alleviated with thorough unit testing to ensure that the two versions produce matching output. That said, option 2 seems most natural to me: applying changes from one source tree to another source tree is a task (most) version control systems were designed for--why not take advantages of the tools they provide to ease this.

For development, it is difficult to say without 'knowing your audience'. Power Python users would probably appreciate not having to download two copies of your software yet for a more general user-base it should probably 'just work'.

Andy
+1  A: 

I would start by migrating to 2.6, which is very close to python 3.0. You might even want to wait for 2.7, which will be even closer to python 3.0.

And then, once you have migrated to 2.6 (or 2.7), I suggest you simply keep just one version of the script, with things like "if PY3K:... else:..." in the rare places where it will be mandatory. Of course it's not the kind of code we developers like to write, but then you don't have to worry about managing multiple scripts or branches or patches or distributions, which will be a nightmare.

Whatever you choose, make sure you have thorough tests with 100% code coverage.

Good luck!

MiniQuark
Python 3.0 is "a new production-ready release", according to http://python.org/download/releases/3.0/ "Python 3000" was the old name before the final release.
Greg Hewgill
Because of the differences in the "print" statement syntax, it's not possible to simply use "if PY3K" statements to select which version. That is why this is a tricky problem.
Greg Hewgill
Hi Greg. I agree with your 1st comment, and I fixed my answer accordingly, thanks. As for your 2nd comment, I think we can work around stuff like that. For the print function, you can use "from __future__ import print_function" (it adds the print function in 2.6 and it's a no-op in 3.0).
MiniQuark
+2  A: 

I don't think I'd take this path at all. It's painful whichever way you look at it. Really, unless there's strong commercial interest in keeping both versions simultaneously, this is more headache than gain.

I think it makes more sense to just keep developing for 2.x for now, at least for a few months, up to a year. At some point in time it will be just time to declare on a final, stable version for 2.x and develop the next ones for 3.x+

For example, I won't switch to 3.x until some of the major frameworks go that way: PyQt, matplotlib, numpy, and some others. And I don't really mind if at some point they stop 2.x support and just start developing for 3.x, because I'll know that in a short time I'll be able to switch to 3.x too.

Eli Bendersky
+8  A: 

The official recommendation says:

For porting existing Python 2.5 or 2.6 source code to Python 3.0, the best strategy is the following:

  1. (Prerequisite:) Start with excellent test coverage.

  2. Port to Python 2.6. This should be no more work than the average port from Python 2.x to Python 2.(x+1). Make sure all your tests pass.

  3. (Still using 2.6:) Turn on the -3 command line switch. This enables warnings about features that will be removed (or change) in 3.0. Run your test suite again, and fix code that you get warnings about until there are no warnings left, and all your tests still pass.

  4. Run the 2to3 source-to-source translator over your source code tree. (See 2to3 - Automated Python 2 to 3 code translation for more on this tool.) Run the result of the translation under Python 3.0. Manually fix up any remaining issues, fixing problems until all tests pass again.

It is not recommended to try to write source code that runs unchanged under both Python 2.6 and 3.0; you’d have to use a very contorted coding style, e.g. avoiding print statements, metaclasses, and much more. If you are maintaining a library that needs to support both Python 2.6 and Python 3.0, the best approach is to modify step 3 above by editing the 2.6 version of the source code and running the 2to3 translator again, rather than editing the 3.0 version of the source code.

Ideally, you would end up with a single version, that is 2.6 compatible and can be translated to 3.0 using 2to3. In practice, you might not be able to achieve this goal completely. So you might need some manual modifications to get it to work under 3.0.

I would maintain these modifications in a branch, like your option 2. However, rather than maintaining the final 3.0-compatible version in this branch, I would consider to apply the manual modifications before the 2to3 translations, and put this modified 2.6 code into your branch. The advantage of this method would be that the difference between this branch and the 2.6 trunk would be rather small, and would only consist of manual changes, not the changes made by 2to3. This way, the separate branches should be easier to maintain and merge, and you should be able to benefit from future improvements in 2to3.

Alternatively, take a bit of a "wait and see" approach. Proceed with your porting only so far as you can go with a single 2.6 version plus 2to3 translation, and postpone the remaining manual modification until you really need a 3.0 version. Maybe by this time, you don't need any manual tweaks anymore...

oefe
+1 this is the official recommendation, and also makes the most sense to me.
Carl Meyer
It sounds like the last sentence of the official recommendation you quoted applies to my situation. It makes sense to leverage 2to3 as much as possible, until such time as the 2.x code can be deprecated completely.
Greg Hewgill