tags:

views:

461

answers:

3

I don't like the "Is too! Is not!" nature of a lot of the responses that this question provoked. But I'm really interested in some of the answers, and I'd like to hear more about them. So I'm asking the question from a different perspective.

It's pretty clear to me, after having built a number of medium-size Python projects by myself, that large-scale development in Python presents problems that I haven't seen before.

For instance: I'm good about implementing unit tests, and so I'm confident that my classes and methods do what they're designed to do. What I'm not confident about is how those methods would work if four or five other developers were using them. I worry that my methods aren't designed properly. I worry that I'm not documenting my APIs properly. I look at the panoply of arguments that zip() takes, and think, man, none of my methods do that.

I also get nervous about the visibility of variables inside class definitions, and wonder if I should be using name mangling more. I get nervous about name mangling. I love that functions are first-class objects, but I worry that injudicious application of that particular feature can result in code graphs that are impervious to analysis.

At the same time, I know that there are teams that are being successful with large-scale development in Python. There are clearly ways to work around the things I'm worried about.

So: what are they? Those of you who have done large-scale development in Python: what have you learned? What do you do to keep your code manageable? What kinds of tools and techniques have been helpful?

(Oh, and in case this isn't obvious, the question isn't "How do you build large-scale systems in Python?" I'm interested in system scalability too, but this question's focusing on scaling development.)

+6  A: 

"What I'm not confident about is how those methods would work if four or five other developers were using them."

Aren't we all.

Python isn't the central issue. It's other developers "getting" the architectural vision. All of your concerns are true for all languages and all teams.

Your concerns don't have "general" (do this and it will always work) answers. Indeed, many of your concerns have no answers. They're just universal play-well-with-others concerns.

Here's my experience. There are 11 kinds of developers -- those who want to be architects, those are capable of being architects, and all the rest. The architect wannabees might get your architecture; whether they do or don't they will pitch their own variation. The capable architects will read what you wrote about the architecture, ask question, and generally be helpful -- then they'll get promoted and you'll never see them again. The rest will not read your descriptions, and will not grok it.

Generally, you simply have to interact with everyone to help guide their thinking. There are no circumstances under which software can be cast into the world to sink or swim on it's own merits. Either you guide developers, or it founders and is forgotten.

You need simple diagrams showing how things fit together. Management-speed diagrams: few colors, bold lines, short words.

You need simple copy-and-paste code samples showing how things work.

Look at the great open source projects. Leadership guiding the other developers. Someone with "check-in" authority who vets the proposed changes.

Look at the Python library documentation -- emulate that.


In response to the comment: none of the following are Python issues.

  • "I'm not confident about is how those methods would work if four or five other developers were using them." Yep. They don't pay attention, see above.
  • "I worry that my methods aren't designed properly." Right. Can't win this. You're excoriated for writing too much as well as too little.
  • "I worry that I'm not documenting my APIs properly." Copy the Python library examples. ["But they're not consistent." Right, they're not. So pick your fave and follow that.]
  • "I look at the panoply of arguments that zip() takes, and think, man, none of my methods do that." Not sure what this means.

    zip takes a list of iterables, which doesn't seem to be a "panoply" to me. But, perhaps the issue is "my functions don't take lists of iterables." Good. Simpler is better.

  • "I...get nervous about the visibility of variables inside class definitions,"

    Threat Scenario: people will read the code and use internal variables improperly.

    If your API makes a modicum of sense -- and works as advertised -- no one will read your code.

    "Sensible": does it pass the 'can I explain it with my hands in my pockets' test? If you can explain it without resorting to diagrams or code samples, it's sensible. When you add code samples, no one will waste time digging into your code to maliciously do bad things with your variables.

The following are (approximately) Python issues.

  • "[I] wonder if I should be using name mangling more." Don't.

    Threat Scenario: people will read code and use methods improperly.

    Implicit assumption: people are actually reading code. They're not. They're cutting and pasting the sample code.

    If your documentation makes a modicum of sense -- and works as advertised -- no one will read the code and find the private methods. They'll only read the code in detail to resolve problems.

  • "I love that functions are first-class objects," So does everyone else.

    Threat Scenario: "code graphs that are impervious to analysis". I suppose this can happen. It isn't likely, however. You have alternatives to function-as-object. First, a function is essentially the same as a callable class. When you need statefulness, that's a potential upgrade. Better yet, you can always define a class with a properly named method.

Example:

class OnceWasAFunction( object ):
    def evaluate( self, *args, **kw ):
        return original_function( *args, **kw )

You've replaced the anonymous method (__call__) with a named method. Same basic behavior, less complex code graph.

"Yes, but how can someone be sure no one's breaking the rules?" That's what I said above -- interaction, pictures, code samples.

Your API unit tests will allow you to fix the implementation and prove that the API didn't change. Further, your API unit tests are a rich place to mine for code samples.

S.Lott
Large projects have failure modes that don't arise from the programming language they're using. You're right: I'm not interested in those. But large projects *also* have failure modes that *do* arise from the programming language. What are Python's, and what do we do about them?
Robert Rossney
+1 for use of binary.
Pete
A: 

To answer a specific concern you mentioned:

I also get nervous about the visibility of variables inside class definitions, and wonder if I should be using name mangling more. I get nervous about name mangling.

Python's data hiding is based on a "gentleman's agreement" not to poke about in other people's stuff. So

  • Marking something "private" by prepending it with an underscore (eg _foo) is perfectly sufficient to hide it as much as it should usually be hidden in Python; and
  • Name mangling still won't truly hide your variable from a client that wishes to access it (eg Class.__foo is mangled to _Class__foo for non-Class code, but it's still accessible via that name.)

Don't sweat it much. Client code will understand the convention that a leading underscore means "keep out", and if they violate that it's on their shoulders to do so carefully.

Michael Gundlach
+1  A: 

Though you've asked a few specific questions about code, most of your concerns seem to be less about "how should we write our code" and more about "how can we make sure we're all communicating and staying on the same page". And, really, that's the trick to scalable development.

Currently I work on a team of eight developers managing a decent-sized (hundred thousand LoC or so) base of Python code, and though we do our best to ensure that code is well-tested and well-commented and well-documented (including both docstrings in code, and notes on our internal wiki and such), I've found the biggest key is simply ensuring that there's a culture of openness and communication in the office: being able to walk over to someone's desk and ask a question or talk about some piece of functionality is far and away more important than anything that'll ever be written down.

Once that's in place, everything else suddenly seems a lot easier: you can get everyone using version-control tools, deployment tools, automated testing tools, documentation generators, whatever specific things you need to keep your project flowing smoothly. But without that base culture of people talking to each other, it'll be pretty much impossible.

James Bennett