views:

663

answers:

15

Hi,

I am writing a small academic research project about extremely long functions. Obviously, I am not looking for examples for bad programming, but for examples of 100, 200 and 600 lines long functions which makes sense.

I will be investigating the Linux kernel source using a script written for a Master's degree written in the Hebrew University, which measures different parameters like number of lines of code, function complexity (measured by MCC) and other goodies. By the way, It's a neat study about code analysis, and a recommended reading material.

I am interested if you can think of any good reason why any function should be exceptionally long? I'll be looking into C, but examples and arguments from any language would be of great use.

+13  A: 

Lots of values in a switch statement?

smcameron
I was going to say the same thing - i've seen HUGE functions as giant switch statements in a state-machine for a controller
n8wrl
I've seen huge switch statements in old Windows programs. The question is, did those programs need to be written that way? The answer is no.
anon
What's wrong with a huge switch statement? It's simple, even if long.
David Thornley
I've seen huge switch statements in old Linux programs. And new Linux programs. I don't think it's a Windows-only problem ;-) nor is it always a problem, as David pointed out.
unforgiven3
@David Perhaps you have never seen a "real" Windows program with while loops nested inside switches nested inside switches nested inside while loops nested inside switches. All in the same function, of course. This programming style was encouraged by the first Petzold book, from which a generation of Windows programmers learned bad habits.
anon
I've seen lots of them in the top of the kernel-longest-function-list, but shouldn't long switch statements transformed into dispatch tables?
Adam Matan
If it doesn't fit on a single screen, it's much harder to grasp. I once saw a function of nested switches that was 16000 lines. It was incomprehensible.
Rob K
+3  A: 

Read the chapter in McConnell's Code Complete about subroutines, it has guidelines and pointers of when you should break things into functions. If you have some algorithm where those rules don't apply, that may be a good reason for having a long function.

whatsisname
Can you think of a concrete example, which I can link to?
Adam Matan
+1  A: 

Sometimes I find myself writing a flat file (for use by third parties) which entails headers, trailers, and detail records that are all linked. It's easier to have a long function for the purpose of computing summaries than it is to devise some scheme to pass values back and forth through lots of small functions.

hova
+17  A: 

I may catch flak for this, but readability. A highly serial, but independent execution that could be broken up into N function calls (of functions that are used nowhere else) doesn't really benefit from decomposition. Unless you count meeting an arbitrary maximum on function length as a benefit.

I'd rather scroll through N function sized blocks of code in order than navigate the whole file, hitting N functions.

patros
I won't give you too much flack, but I'll point out that if the function is task based I like to see TaskA(); TaskB(); ... TaskN(); even if those task functions aren't called from anywhere else.
Nathan Koop
Nitpicking - it's flak (in the sense of AA fire), a "flack" is PR person.
anon
Ah, right you are. As long as I don't catch and flac's I'll be ok.
patros
The benefit is just that if the chunks really are independent, then you can *prove* it by actually breaking them up. Neither you nor any maintainers can store state in variables that last the length of the function, are modified in lots of different places, and have only very complicated invariants. Once a function gets long, a big-scope local variable has many of the nasty characteristics of globals. By splitting it into functions, you can't modify anything you didn't visibly pass a reference to. Of course if decompositions succeeds, you've mostly proved that you needn't have bothered ;-)
Steve Jessop
... and you can achieve 50% of the benefits just by wrapping each chunk in braces and not declaring too much in the outermost scope of the function.
Steve Jessop
I can also see it to my satisfaction by reading the code, and by placing variable declarations sensibly. Once code is clean and in place it's much easier to see when someone does something stupid by looking at the diff. I tend to reject as reasoning "because someone might do something bad if you let them". Code, like anything, should not be a nanny state. If you don't trust someone to write good code, don't hire them. People make mistakes of course, but that's what code reviews and QA are for.
patros
I disagree with you, patros. I had a function that was about 1000 lines long. It handled 3 different stages of processing that were done only in that one function, but it was impossible to tell where each stage began and where it ended, or to get an idea of the whole flow. I split it into about 10 functions that the main function called, and lo and behold, I was able to get a good picture of the flow, even though these functions weren't called anywhere else. Think of it as collapsing headers in a text document.
Nathan Fellman
@patros: sure, I'm not saying that you will find your code easier to handle if you follow some specific convention. I'm saying that other people will, for the advantage outlined, which is that they don't have to read multiple pages of code to see your function's serial/independent structure. Personally I'm fairly ambivalent on the matter: unless style guide dictates otherwise, I write much longer functions than some people would like. I don't think juggling 5 or 6 variables is a problem.
Steve Jessop
@Nathan Would it have been impossible to break the one function into 10 chunks using whitespace and comments? Obviously both methods have their merits and drawbacks, and for me the most important factor is going to be based on your editor or IDE. I tend not to do it in C, where I mostly use vim. In C++ or Java, I'm more likely to since the IDEs I use support jumping to functions, and keep a stack of everything you've visited so far.
patros
+4  A: 

Functions can get longer over time, especially if they are modified by many sets of developers.

Case in point: I recently (~1yr or 2 ago) refactored some legacy image processing code from 2001 or so that contained a few several-thousand-line-functions. Not a few several-thousand-line-files - a few several-thousand-line-functions.

Over the years so much functionality was added to them without really putting in the effort to refactor them properly.

unforgiven3
So that's bad programming per se, IMHO - forced, perhaps, by the circumstances.
Adam Matan
Right, but the question was "Why are some functions extremely long?" - that's my answer :-)
unforgiven3
A: 

XML parsing code often has reams of escape character processing in one setup function.

Rich Seller
+9  A: 

Anything generated from other sources, i.e. a finite state machine from a parser generator or similar. If it's not intended for human consumption, aesthetic or maintainability concerns are irrelevant.

Adam Wright
+2  A: 

Generated code can generate very very long functions.

FerranB
+1  A: 

One point that I think has a bearing is that different languages and tools have different lexical scoping associated with functions.

For example, Java allows you to suppress warnings with an annotation. It may be desirable to limit the scope of the annotation and so you keep the function short for that purpose. In another language, breaking that section out into it's own function might be completely arbitrary.

Controversial: In JavaScript, I tend to only create functions for the purpose of reusing code. If a snippet is only executed in one place, I find it burdensome to jump around the file(s) following the spaghetti of function references. I think closures facilitate and therefore reinforce longer [parent] functions. Since JS is an interpreted language and the actual code gets sent over the wire, it's good to keep the length of the code small--creating matching declarations and references doesn't help (this could be considered a premature optimization). A function has to get pretty long in JS before I decide to chop it up for the express purpose of "keeping functions short".

Again in JS, sometimes the entire 'class' is technically a function with many enclosed sub-functions but there are tools to help deal with it.

On the other hand in JS, variables have scope for the length of the function and so that's a factor that may limit the length of a given function.

steamer25
+1  A: 

The very long functions I come across are not written in C, so you'll have to decide whether this applies to your research or not. What I have in mind are some PowerBuilder functions that are several hundred of lines long, being so for the following reasons:

  • They've been written over 10 years ago, by people who at that time did not have coding standards in mind.
  • The development environment makes it a bit harder to create functions. Hardly a good excuse, but it's one of those little things that sometimes discourages you from working properly, and I guess someone just got lazy.
  • The functions have evolved over time, adding both code and complexity.
  • The functions contain huge loops, each iteration possibly handling different kind of data in a different way. Using tens(!) of local variables, some member variables and some globals, they have become extremely complex.
  • Being that old and ugly, no one dares refactoring them into smaller parts. Having so many special cases handled in them, breaking them apart is asking for trouble.

This is yet another place where obvious bad programming practices meet reality. While any first year CS student could say those beasts are bad, no one would spend any money on making them look prettier (given that at least for now, they still deliver).

eran
A: 

The functions I deal with (not write) become long because are expanded and expanded and no one spends the time to re-factor the functions. They just keep adding logic to the functions with no thought to the big picture.

I deal with a lot of cut-n-paste development...

So, for the paper, one aspect to look at is poor maintenance plan/cycle, etc.

Frank V
A: 

The only ones I've recently coded are where it doesn't achieve much to make them smaller or can make the code less readable. The notion that a function that is over a certain length is somehow intrinsically bad is simply blind dogma. Like any blindly applied dogma ot relieves the follower of the need to actually think about what applies in any given case...

Recent examples...

Parsing, and validating a config file with simple name=value structure into an array, converting each value as I find it, this is one massive switch statement, one case per config option. Why? I could have split into lots of calls to 5/6 line trivial functions. That would add about 20 private members to my class. None of them are reused anywhere else. Factoring it into smaller chunks just didn't add enough value to be worth it, so it's been the same ever since the prototype. If I want another option, add another case.

Another case is the client and server communication code in the same app, and its client. Lots of calls to read/write any of which can fail, in which case I bail and return false. So this function is basically linear, and has bail points (if failed, return) after almost every call. Again, nothing to gain by making it smaller and no way to really make it any smaller.

I should also add that most of my functions are a couple of "screenfuls" and I strive in more complex areas to keep it to one "screenful", simply because I can look at the whole function at once. It's ok for functions that are basically linear in nature and don't have lots of complex looping or conditions going on so the flow is simple. As a final note I prefer to apply cost-benefit reasoning when deciding which code to refactor, and prioritise accordingly. Helps avoid the perpetually half-finished project.

+1  A: 

By far the most common I see/write are long switch statements or if/else semi-switch statements for types that can't be used in this language's switch statements (already mentioned a few times). Generated code is an interesting case, but I'm focusing on human-written code here. Looking at my current project, the only truly long function not included above (296 LOC/650 LOT) is some Cowboy Code I'm using as an early evaluation the output of a code generator I plan to use in the future. I'll definitely be refactoring it, which removes it from this list.

Many years ago, I was working on some scientific computing software that had a long function in it. The method used a large number of local variables and refactoring the method kept resulting in a measurable difference per profiling. Even a 1% improvement in this section of code saved hours of computation time, so the function stayed long. I've learned a great deal since then, so I can't speak to how I'd handle the situation today.

280Z28
A: 

A few ideas not explicitely mentioned yet:

  • repetitive tasks, e.g. the function reads a database table with 190 columns and has to output them as a flat file (assuming that columns need to be treated individually, so a simple loop over all columns won't do). Of course you could create 19 functions, each outputting 10 columns, but that wouldn't make the program any better.
  • complicated, verbose APIs, like Oracle's OCI. When seemingly simple actions require large amounts of code, it's hard to break it down into small functions that make any sense.
ammoQ
+1  A: 

Speed:

  • Calling a function means pushing to the stack, then jumping, then storing on the stack again, then jumping again. if you use parameters to the function, you usually have several more pushes.

Consider a loop:

for...
   func1

inside a loop, all those pushes, and jumps can be a factor.

This was largely solved with the presentation of Inline Functions on C99 and unofficially before that, But some code written before, or was created with compatibility in mind, may have been long for that reason.

Also Inline has it's flows, some are described on the Inline Functions link.

Edit:

As an example of how a call to a function can make a program slower:

4         static void
5 do_printf()
6 {
7         printf("hi");
8 }
9         int
10 main()
11 {
12         int i=0;
13         for(i=0;i<1000;++i)
14                 do_printf();
15 }

This produces (GCC 4.2.4):

 .
 . 
 jmp    .L4
 .L5:
call do_printf
addl $1, -8(%ebp)
 .L4:
cmpl $999, -8(%ebp)
jle .L5

 .
 .
do_printf:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl $.LC0, (%esp)
call printf
leave
ret

against:

         int
 main()
 {
         int i=0;
         for(i=0;i<1000;++i)
                 printf("hi");
 }

or against:

 4         static inline void __attribute__((always_inline)) //This is GCC specific!
 5 do_printf()
 6 {
 7         printf("hi");
 8 }

Both produce (GCC 4.2.4):

jmp .L2
.L3:
movl $.LC0, (%esp)
call printf
addl $1, -8(%ebp)
.L2:
cmpl $999, -8(%ebp)
jle .L3

Which is faster.

Liran Orevi
Thanks and Toda Raba. I think that the major problem with inline function is their huje list of arguments, in many cases (no pun intended).
Adam Matan