views:

419

answers:

7

Greetings. I have been looking at Literate Programming a bit now, and I do like the idea behind it: you basically write a little paper about your code and write down as much of the design decisions, the code probably surrounding the module, the inner workins of the module, assumptions and conclusions resulting from the design decisions, potential extension, all this can be written down in a nice way using tex. Granted, the first point: it is documentation. It must be kept up-to-date, but that should not be that bad, because your change should have a justification and you can write that down.

However, how does Literate Programming Scale to a larger degree? Overall, Literate Programming is still just text. Very human readable text, of course, but still text, and thus, it is hard to follow large systems. For example, I reworked large parts of my compiler to use >> and some magic to chain compile steps together, because some "x.register_follower(y); y.register_follower(z); y.register_follower(a);..." got really unwieldy, and changing that to x >> y >> z >> a made it a bit better, even though this is at its breaking point, too.

So, how does Literate Programming scale to larger systems? Does anyone try to do that?

My thought would be to use LP to specify components that communicate with each other using event streams and chain all of these together using a subset of graphviz. This would be a fairly natural extension to LP, as you can extract a documentation -- a dataflow diagram -- from the net and also generate code from it really well. What do you think of it?

-- Tetha.

+1  A: 

Literate Programming was developed in an era where long variable and function names were simply not possible. Because of this, code really wasn't that readable.

Obviously, a lot has happened since then.

In today's world, the code itself is the documentation, hence the term "self documenting code." The realization is that no set of comments or external documentation can ever stay in sync with the underlying code. So, the goal of a lot of today's programmers is to write the code in such a way that it is readable to others.

Chris Lively
Also missing back then: namespaces and classes. I think Literate Programming is an artifact of its time that is not relevant any more.
Nathan Sanders
All of these nice new (or not, look up the inception of smalltalk) features mean you need *less* documentation, not no documentation. Anywhere you had to think deeply before you began, you probably need to explain what you decided.
dmckee
@dmckee: Agreed. This whole "the code is its own documentation" is a dangerous trend, people tend to forget that documentation isn't just about what is the code doing (which is obvious from reading it), but *why*, which is the critical question, and should be documented when not obvious.
Adam Bellaire
Given sufficiently complicated algorithms, even the what may be not obvious, no matter how clear your code is
Tetha
@Adam: I think the point is that systems are highly organic structures which may change radically through it's life. With that in mind, even the Why will change as the code changes.I do agree that non-obvious code should have short in line comments to identify what it's doing.
Chris Lively
Yeah, like yeah, like no. So when you're inheriting a colleague's code, it's easier to just sit and read the code than to have the colleague come and do a 'guided tour'? The 'human' explanation will often be in a completely different order, and will provide the big picture that your code can't.
Benjol
+4  A: 

Excellent question. The motivation for literate programming will never go away, but I think it should be treated as fluid. It means "give the reader a break, and educate them to what you're trying to do". I don't think it means "make your code really wordy".

That said, the reader will have to put some effort into it, depending on what they already know. Presumably the code is worth understanding, and nothing comes for free.

I also think it means more than just making readable code. Most likely the reason someone is reading the code is because they need to make a change. You should anticipate the possible changes that might be needed, and tell them how to do it if necessary.

Mike Dunlavey
+5  A: 

The book "Physically Based Rendering" (pbrt.org) is the best example of large-scale literate programming that I'm aware of. The book implements a complete rendering system, and both the book text and the raytracer code are generated from the same "source".

In practice, I've found that just using a system like Doxygen and really digging in and making use of all of its features is better than full-blown "literate" programming, except for things like this, i.e. textbooks, educational materials.

+2  A: 

pbrt is a physically based ray tracer written in the literate style for the education of computer science graduates (and me), it is a moderately large scale system. As a non-specialist programmer this level of documentation is pretty essential for understanding what the program does and why it does it.

I also have access to a research-renderer, in Java, which is well-written but relatively undocumented but for a few SIGGRAPH papers. This is also relatively understandable, but I have access to the authors too.

I've also used ImageJ quite a lot, and looked under the hood at underlying Java - it's pretty difficult to follow without an idea of the underlying philosophy.

In sum, my view is that literate programming is great if someone can find the time to do it well and this is likely to be in educational settings. It's difficult to see it being done in commercial code production. I'm skeptical of the idea that code can be entirely self-documenting.

Ian Hopkinson
+2  A: 

I did some literate programming with WEB some 15 years ago. More recently I tried extracting code from a wiki and generating documentation from a Squeak Smalltalk environment.

The bottom-up part can be handled relatively well by generating documents from TDD/BDD frameworks, but LP focuses on explaining the code to the reader.

There are a few issues:

  • the story to tell is different for different stakeholders/readers;
  • the project structure in most environments is not the structure needed for story-telling;
  • support for successive refinement/disclosure is missing;
  • in addition to text support for pictures is needed;
  • from the comments in the source control system one can derive how the system was build. The story should be how the system could have been build (with perfect hindsight).

For LP to work for larger systems, you need better IDE support than a wiki or an object browser.

Stephan Eggermont
+1 for actually having done some literate programming
Larry Watanabe
+1  A: 

The idea behind literate programming is emphasis on the documentation, with code sprinkled through the documentation, rather than comments sprinkled through code.

This is an essentially different philosophy, and differences like longer variable names, namespaces, and classes don't affect the philosophy. Literate programming advocates meaningful variable names.

It scales up to larger systems, because the basic ratio of documentation to code scales linearly with the size of code.

Larry Watanabe
+2  A: 

"Overall, Literate Programming is still just text"

False.

Diagrams are fine.

My thought would be to use LP to specify components that communicate with each other using event streams

That's just architecture, and that's fine.

you can extract a documentation -- a dataflow diagram -- from the net and also generate code from it really well. What do you think of it?

Data flow diagrams aren't really all that helpful for generating detailed code. They're a handy summary, not a precise source of information.

A good writing tool (like LaTex) can encode the diagram in the document. You could probably figure a way to the diagram from other parts of the documentation.

Bottom Line

In the long run, you're better off generating the diagram as a summary of the text.

Why?

Diagrams intentionally elide details. A diagram is a summary or an overview. But as a source for code, diagrams are terrible. In order to provide all the details, the diagrams become very cluttered.

But a diagrammatic summary of some other LP markup will work out fine.

S.Lott