views:

1453

answers:

16

Edit: Since this question was asked a lot of improvement has happened in the standard Python scientific libraries (which was the targeted area). For example the numpy project has made a big effort to improve the docstrings. One can still argue if it would have been possible to address these issues continuously right from the start.


I have this somewhat heretic question: Why do so many Python libraries have messy code and don't follow standard best practices? Or do you think that this observation is absolutely not true? How does the situation compare to other languages? I am interested in your take on this.

Some reasons why I have the impression that quality is lacking:

  • The docstrings are often completely missing or incomplete, even for the public API. It is painful when a method takes *args and **kwargs but does not document which values can be given.

  • Bad Python coding practices, like adding new attributes outside of __init__. Things like this make the code hard to read (or to maintain).

  • Hardly any libraries follow the PEP8 coding conventions. Sometimes the conventions are not even consistent in a single file.

  • The overall design is messy, with no clear API. It seems that not nearly enough refactoring is done.

  • Poor unittest coverage.

Don't get me wrong, I absolutely love Python and its ecosystem. And even though I struggled with these libraries they generally get the job done and I am grateful for that. But I also think that in the end tons of developer time are wasted because of these issues. Maybe that is because Python gives you so much freedom that it is very easy to write bad code.

A: 

nikow: I can only answer for myself, most of my Python (and PHP or Ruby, all dynamic "scripting" languages) work is done just for me - but I always release it on my personal site if anyone else finds it useful, but I never go through any documentation or QA process because as long as it works for me I'm happy.

thr
If I try to read my code after 2 months of not using it, often it feels like it was done by someone other than me. Therefore I think it's a good idea to document code.
Georg
@gs - yes, it is useful to document. But it is even more useful to go and create new code in that time. Although it requires a little extra thinking later, a person really never loses his way of thinking. Just has to check it out a little more later.
ldigas
+4  A: 

PEP8 is a style guide, not a style requirement. It even states that you should "know when to be inconsistent".

Personally, I don't like some of the guidelines in it. I prefer tabs to spaces (but still abhor the mixing of the two).

And, to be brutal, I don't often look at the source code of libraries I'm using unless it's absolutely necessary. I'd much prefer the documentation for said library to be adequate enough that I never have to look at the source (bugs notwithstanding, of course).

Seriously, why do you really care what the source code for matplotlib looks like, as long as it does what it is intended to do?

I agree with you on the missing docstrings (assuming they're public elements rather than internal ones) but that's not the only way to document a library.

paxdiablo
I'd care for the intelligibility of the code on MatPlotLib if I was trying to help the project.And the idea of "no one will look at it" shouldn't be an excuse to writing crap code...
Rodrigo
What's the ratio of people who *use* a given library to those who *code* it? Pretty high, I suspect. If you have to read the code then fine, it should be understandable. But do you read the Linux kernel source whenever you want to fork or select? I don't. And crap code is subjective...
paxdiablo
When I'm using jQuery, I use the minified version, despite its unreadability :-).
paxdiablo
A: 

Well, they are open source. As such they will also evolve over time, if they are good enough.

That's one of the many beauties of open source.

Often there is little sense in writing lot of documentation and "good" code if you don't know whether the project will live on. That would just be a waste of time.

Edit: Of course writing good code would never hurt the first time around though... But maybe just "getting the job done" is good enough in many cases. I think that otherwise we wouldn't enjoy the vast amount of options when it comes to OSS.

I think that if enough people act a specific way there might be some explanation to it. They are not just randomly doing so to offend you.

Subtwo
+1  A: 

PEP8 is just that, a convention, not a requirement. It would be really sad if all python programmers had to adhere to a common set of rules, we lose enthusiasm over the slightest of issues.

As far as missing docstrings are concerned - yes, they can help when using interactive help - but I generally don't mind as long as there's some documentation. I try not to read the source code of the libraries I use, I tend to start modifying (rewriting) them.

sykora
+19  A: 

Regarding documentation, it's not just Python. If there is one single factor that is preventing the wider adoption of OSS it is, IMHO, the truly dreadful level of documentation of most OSS projects. This starts at the code level and extends to the user docs. Can I just say to anyone working on OSS:

a) Comment your code! There is no such thing as self documenting code!

b) Spend at least 25% of the project time budget on end-user documentation.

And I do know vaguely what I'm talking about - I have a couple of OSS projects of my own, I've contributed to several others and I use OSS almost exclusively. And yesterday I spent over 4 hours trying to build a major OSS project (no names, no pack drill), and failing because of the crappy, self-contradictory documentation.

anon
Maybe "Start with documentation and unittests" is also a guide thing but maybe hard to achieve of you are not really behind it (and know the tools in and out).
MrTopf
nikow, the cause is for the same reason neil's answers has the most votes when saying "Comment your code! There is no such thing as self documenting code!" ... basically, its the thinking that: its ok, we all make a mess, but don't panic we will add comments ... sigh
eglasius
The annoying thing is that I find it really easy to include comments when you write code for the first time. But to add them later you have to read through the whole code again, which takes 10x as long. And you might even confuse bugs and features, because you just don't remember.
nikow
Comments written during initial development often get obviated later, leading to the very inconsistencies complained about.
Gregg Lind
@Gregg: That is certainly a problem if the comments are not updated (e.g. when they are redundant and no one bothers). But I think docstrings are vital to document in short what the code is supposed to do. Why not start development with docstrings? Polishing them can still be left for later.
nikow
+1 I've never understood how reading 100 lines of code, no matter how well-written, is easier than reading 2 sentences written in English.One of the biggest problems with Python is the abundance of abandonware projects. You start using some popular (at the time) library, and a year later the developers have grown bored. In theory, this is fine; it's OSS, so someone else can step up. In practice, the code is so poorly documented that it's just not practical.Honestly, if you aren't going to comment and document your code, please don't release it in the first place.
DNS
+6  A: 

Instead the authors each seem to follow their own glorious convention. And sometimes the conventions are not even consistent with the same file of a library

Welcome to the wonderful code of the real world!

FWIW Python code I have met is no better or worse than that in any other language.

(Well, better than the average PHP project, obviously, but that's not really fair.)

bobince
+5  A: 

This is because Python is not backed up by the corporate world like Java or .Net .

If I want my Java library to be promoted by Sun I will follow their guidelines. This is not the case with Python. I write my code, people find it better and it has to evolve on its own.

Also most Python developers are from C++, C, Java,.Net etc. And they start writing production code right from the first day. Thanks to easiness of Python. And the vicious cycle continues.

Even it took me a month to come to PEP8 and refactor my code.

Xolve
Seriously? You expect Sun to promote your Java library because you followed their guidelines?
ykaganovich
No. But if you don't follow their guidelines surely they won't promote it. Enterprises put a lot of efforts and marketing to promote uniformity.
Xolve
Sun does not promote thirdparty libraries regardless of whether the latter follow their guidelines. People still follow the guidelines because it's the Right Thing To Do (for consistency, not because Sun says so).
ykaganovich
So, people should do it with Python too!
Rodrigo
Also lets just check out the docs of Python and Java.Python took it a long time to come to specify a coding standard. This may be because Guido van Rossum wanted to see how community come up.
Xolve
With Java Sun specifically specify the coding standard in the doc and this is given in all Java books.With Python we have to look into a PEP (i.e. PEP8, PEP257 etc.) to find out.
Xolve
+4  A: 

As for matplotlib, there is project to improve it's "pythoness" - http://www.scipy.org/PyLab

The thing about scientific libraries, is that they are written by scientist, no by professional software developers. Moveover, those scientist are used to write Fortran. The question is -- would you rather have working code or beautiful code?

vartec
What about some info about the link ? Maybe explaining why you left it there ? How does it fit the topic ?
Martin
A: 

Quality of code * number of comments * time = constant

Pick two !

I never had any problem using matplotlib; can't say I looked at the code much - it is a fine library. Does what it is supposed to do (for free !)

ldigas
So, for a fixed code quality, the more time you spend, the fewer comments you end up writing?
John Fouhy
I guess it should be 1/time. But I don't think that quality of code and the number of comments are independent. The OpenSSH/Debian fiasco proved this the hard way :)
nikow
+6  A: 

PEP 8 has changed over time. Some modules follow older recommendations. You can see that with PIL, which uses modules like "Image" where the module contains a single main class, instead of the recommended lowercase for module names, and in C extensions which use the "c" prefix, rather than the more modern "_" prefix.

Some of the libraries are developed by people who are strongly influenced by traditions in other fields, like Java and C++. These people more often use CamelCase instead of the PEP 8 recommended lowercase_with_underscores.

The answers here wouldn't be complete without reference to Sturgeon's Law: "Ninety percent of everything is crap."

Andrew Dalke
If someone tries to program in Python, shouldn't they follow its guidelines?
Rodrigo
Generally, yes. But if the guidelines change, what obligation is there to follow? Do you create parallel APIs so dependent code doesn't break?
Andrew Dalke
+5  A: 

Ninety percent of [python libraries] are crud, but ninety percent of everything is crud

-- Sturgeons law (paraphrased)

TokenMacGuy
+4  A: 

I believe that Python suffers from being hoisted too eagerly on people who are not programmers (by schooling or trade) as a solution for "need some programming done? Here, try this easy and mature tool".

Similarly to how PHP became such a huge success and with so many libraries with abysmal code quality (even if, granted, the average Python code quality is better then for PHP) - the average PHP user similarly to the average Python user has not much programming experience or skills and very little incentive to improve themselves in this regard - they set out to achieve something, and maybe they thought it is worthy enough to be shared with the community in the form of a library, but most often once the job is done they have no interest to better the code or better themselves (in programming skills, I mean).

The solution might be for Python library repositories (such as PyPI) to have stricter rules about accepting contributed packages - handle this with a review process whose purpose is to ensure quality - the same way that major Linux distributions have a review process before adding a package to their repositories.

Guss
+5  A: 

It sounds like you have come to find that code quality does not meet the expectations you were set up to expect. Perhaps from school, or best practices books or senior developers.

After having worked at several companies, I found myself regularly advised to do unit tests, document code, use version/source control (all good advice that I have taken) then finding that the givers of that advice rarely follow the advice themselves.

I would say that you do have the right impression that sometimes the code quality is low, but only based on your expectations. Certainly numpy and others are quite useful packages, even if not coded to the standard you were set up to expect.

Standards are opinions, and if you are of the opinion that standards are low, then you can try to help make those standards better by contributing, or accept them as they are and be sure to write code that serves as an example to the juniors you will find yourself in charge of one day.

Chris Cameron
+1  A: 

Regarding comparison with other languages, I think that language design plays a big part here. For example, in a strong-typed language like Java, even if the library is missing good documentation, you can still deduce much of the functionality from the method signatures. No *args to contend with.

ykaganovich
+8  A: 

The first thing you need to realize is that Python did not spring, fully formed, from the head of Guido sometime around version 2.x. It's grown over the course of the past twenty years.

In fact, a number of the things you mention (unittest, for example, and PEP-8), didn't even exist when some of the standard libraries were first written.

You'll probably notice that the older the library you're looking at, the more likely they are to have divergences from the current "best practices"--often because they predate widespread adoption of those practices. More recent libraries are more likely to conform to current practices.

Additionally, sometimes there is often a good reason for not bringing them up to date. Imagine you have several tens of thousands of lines of code written against the current Python libraries. Now, the maintainer of one of those libraries decides to change the libraries to make the class and function names conform to PEP-8. Now everyone who has working code has to revisit huge amounts of it, lest the renaming break things.

That's not to say there aren't things that can improve in Python libraries--there are! But there's always a trade-off between perfection and getting things done. That's one reason they say "Practicality beats purity."

Tim Lesher
+1  A: 

How about a collection of examples of good software doc ?
Good examples might lead to overall improvement a bit faster than random walk.
The collection could be split into categories such as:
inline doc / help page / tutorial / reference manual, web page / paper, pictures / none.
Each entry should have a few words on why the reviewer finds it good.
(Where: a corner of stackoverflow ?)

Denis
Yes, such a collection would be interesting. One could open this as a new question: Which Python libraries have the highest quality standard?
nikow