ansaurus

Question

Why are some functions extremely long? (ideas needed for an academic research!)

Answer 1

+13 A:

Lots of values in a switch statement?

smcameron 2009-07-14 19:54:24

I was going to say the same thing - i've seen HUGE functions as giant switch statements in a state-machine for a controller

n8wrl 2009-07-14 19:55:36

I've seen huge switch statements in old Windows programs. The question is, did those programs need to be written that way? The answer is no.

anon 2009-07-14 19:57:39

What's wrong with a huge switch statement? It's simple, even if long.

David Thornley 2009-07-14 20:03:18

I've seen huge switch statements in old Linux programs. And new Linux programs. I don't think it's a Windows-only problem ;-) nor is it always a problem, as David pointed out.

unforgiven3 2009-07-14 20:12:28

@David Perhaps you have never seen a "real" Windows program with while loops nested inside switches nested inside switches nested inside while loops nested inside switches. All in the same function, of course. This programming style was encouraged by the first Petzold book, from which a generation of Windows programmers learned bad habits.

anon 2009-07-14 20:17:12

I've seen lots of them in the top of the kernel-longest-function-list, but shouldn't long switch statements transformed into dispatch tables?

Adam Matan 2009-07-14 20:22:02

If it doesn't fit on a single screen, it's much harder to grasp. I once saw a function of nested switches that was 16000 lines. It was incomprehensible.

Rob K 2009-07-14 20:26:07

Answer 2

+3 A:

Read the chapter in McConnell's Code Complete about subroutines, it has guidelines and pointers of when you should break things into functions. If you have some algorithm where those rules don't apply, that may be a good reason for having a long function.

whatsisname 2009-07-14 19:56:44

Can you think of a concrete example, which I can link to?

Adam Matan 2009-07-15 06:12:34

Answer 3

+1 A:

Sometimes I find myself writing a flat file (for use by third parties) which entails headers, trailers, and detail records that are all linked. It's easier to have a long function for the purpose of computing summaries than it is to devise some scheme to pass values back and forth through lots of small functions.

hova 2009-07-14 20:04:01

Answer 4

+17 A:

I may catch flak for this, but readability. A highly serial, but independent execution that could be broken up into N function calls (of functions that are used nowhere else) doesn't really benefit from decomposition. Unless you count meeting an arbitrary maximum on function length as a benefit.

I'd rather scroll through N function sized blocks of code in order than navigate the whole file, hitting N functions.

patros 2009-07-14 20:06:11

I won't give you too much flack, but I'll point out that if the function is task based I like to see TaskA(); TaskB(); ... TaskN(); even if those task functions aren't called from anywhere else.

Nathan Koop 2009-07-14 20:09:24

Nitpicking - it's flak (in the sense of AA fire), a "flack" is PR person.

anon 2009-07-14 20:23:20

Ah, right you are. As long as I don't catch and flac's I'll be ok.

patros 2009-07-14 20:39:47

The benefit is just that if the chunks really are independent, then you can *prove* it by actually breaking them up. Neither you nor any maintainers can store state in variables that last the length of the function, are modified in lots of different places, and have only very complicated invariants. Once a function gets long, a big-scope local variable has many of the nasty characteristics of globals. By splitting it into functions, you can't modify anything you didn't visibly pass a reference to. Of course if decompositions succeeds, you've mostly proved that you needn't have bothered ;-)

Steve Jessop 2009-07-15 00:03:16

... and you can achieve 50% of the benefits just by wrapping each chunk in braces and not declaring too much in the outermost scope of the function.

Steve Jessop 2009-07-15 00:04:48

I can also see it to my satisfaction by reading the code, and by placing variable declarations sensibly. Once code is clean and in place it's much easier to see when someone does something stupid by looking at the diff. I tend to reject as reasoning "because someone might do something bad if you let them". Code, like anything, should not be a nanny state. If you don't trust someone to write good code, don't hire them. People make mistakes of course, but that's what code reviews and QA are for.

patros 2009-07-15 03:32:51

I disagree with you, patros. I had a function that was about 1000 lines long. It handled 3 different stages of processing that were done only in that one function, but it was impossible to tell where each stage began and where it ended, or to get an idea of the whole flow. I split it into about 10 functions that the main function called, and lo and behold, I was able to get a good picture of the flow, even though these functions weren't called anywhere else. Think of it as collapsing headers in a text document.

Nathan Fellman 2009-07-15 06:57:46

@patros: sure, I'm not saying that you will find your code easier to handle if you follow some specific convention. I'm saying that other people will, for the advantage outlined, which is that they don't have to read multiple pages of code to see your function's serial/independent structure. Personally I'm fairly ambivalent on the matter: unless style guide dictates otherwise, I write much longer functions than some people would like. I don't think juggling 5 or 6 variables is a problem.

Steve Jessop 2009-07-15 12:04:57

@Nathan Would it have been impossible to break the one function into 10 chunks using whitespace and comments? Obviously both methods have their merits and drawbacks, and for me the most important factor is going to be based on your editor or IDE. I tend not to do it in C, where I mostly use vim. In C++ or Java, I'm more likely to since the IDEs I use support jumping to functions, and keep a stack of everything you've visited so far.

patros 2009-07-15 14:46:43

Answer 5

+4 A:

Functions can get longer over time, especially if they are modified by many sets of developers.

Case in point: I recently (~1yr or 2 ago) refactored some legacy image processing code from 2001 or so that contained a few several-thousand-line-functions. Not a few several-thousand-line-files - a few several-thousand-line-functions.

Over the years so much functionality was added to them without really putting in the effort to refactor them properly.

unforgiven3 2009-07-14 20:06:21

So that's bad programming per se, IMHO - forced, perhaps, by the circumstances.

Adam Matan 2009-07-14 20:23:59

Right, but the question was "Why are some functions extremely long?" - that's my answer :-)

unforgiven3 2009-07-14 20:25:53

Answer 6

A:

XML parsing code often has reams of escape character processing in one setup function.

Rich Seller 2009-07-14 20:06:29

Answer 7

+9 A:

Anything generated from other sources, i.e. a finite state machine from a parser generator or similar. If it's not intended for human consumption, aesthetic or maintainability concerns are irrelevant.

Adam Wright 2009-07-14 20:08:04

Answer 8

+2 A:

Generated code can generate very very long functions.

FerranB 2009-07-14 20:10:23

Answer 9

+1 A:

One point that I think has a bearing is that different languages and tools have different lexical scoping associated with functions.

For example, Java allows you to suppress warnings with an annotation. It may be desirable to limit the scope of the annotation and so you keep the function short for that purpose. In another language, breaking that section out into it's own function might be completely arbitrary.

Controversial: In JavaScript, I tend to only create functions for the purpose of reusing code. If a snippet is only executed in one place, I find it burdensome to jump around the file(s) following the spaghetti of function references. I think closures facilitate and therefore reinforce longer [parent] functions. Since JS is an interpreted language and the actual code gets sent over the wire, it's good to keep the length of the code small--creating matching declarations and references doesn't help (this could be considered a premature optimization). A function has to get pretty long in JS before I decide to chop it up for the express purpose of "keeping functions short".

Again in JS, sometimes the entire 'class' is technically a function with many enclosed sub-functions but there are tools to help deal with it.

On the other hand in JS, variables have scope for the length of the function and so that's a factor that may limit the length of a given function.

steamer25 2009-07-14 20:15:31

Answer 10

+1 A:

The very long functions I come across are not written in C, so you'll have to decide whether this applies to your research or not. What I have in mind are some PowerBuilder functions that are several hundred of lines long, being so for the following reasons:

They've been written over 10 years ago, by people who at that time did not have coding standards in mind.
The development environment makes it a bit harder to create functions. Hardly a good excuse, but it's one of those little things that sometimes discourages you from working properly, and I guess someone just got lazy.
The functions have evolved over time, adding both code and complexity.
The functions contain huge loops, each iteration possibly handling different kind of data in a different way. Using tens(!) of local variables, some member variables and some globals, they have become extremely complex.
Being that old and ugly, no one dares refactoring them into smaller parts. Having so many special cases handled in them, breaking them apart is asking for trouble.

This is yet another place where obvious bad programming practices meet reality. While any first year CS student could say those beasts are bad, no one would spend any money on making them look prettier (given that at least for now, they still deliver).

eran 2009-07-14 20:39:10

Answer 11

A:

The functions I deal with (not write) become long because are expanded and expanded and no one spends the time to re-factor the functions. They just keep adding logic to the functions with no thought to the big picture.

I deal with a lot of cut-n-paste development...

So, for the paper, one aspect to look at is poor maintenance plan/cycle, etc.

Frank V 2009-07-14 20:39:25

Answer 12

A:

The only ones I've recently coded are where it doesn't achieve much to make them smaller or can make the code less readable. The notion that a function that is over a certain length is somehow intrinsically bad is simply blind dogma. Like any blindly applied dogma ot relieves the follower of the need to actually think about what applies in any given case...

Recent examples...

Parsing, and validating a config file with simple name=value structure into an array, converting each value as I find it, this is one massive switch statement, one case per config option. Why? I could have split into lots of calls to 5/6 line trivial functions. That would add about 20 private members to my class. None of them are reused anywhere else. Factoring it into smaller chunks just didn't add enough value to be worth it, so it's been the same ever since the prototype. If I want another option, add another case.

Another case is the client and server communication code in the same app, and its client. Lots of calls to read/write any of which can fail, in which case I bail and return false. So this function is basically linear, and has bail points (if failed, return) after almost every call. Again, nothing to gain by making it smaller and no way to really make it any smaller.

I should also add that most of my functions are a couple of "screenfuls" and I strive in more complex areas to keep it to one "screenful", simply because I can look at the whole function at once. It's ok for functions that are basically linear in nature and don't have lots of complex looping or conditions going on so the flow is simple. As a final note I prefer to apply cost-benefit reasoning when deciding which code to refactor, and prioritise accordingly. Helps avoid the perpetually half-finished project.

2009-07-14 21:11:46

Answer 13

+1 A:

By far the most common I see/write are long switch statements or if/else semi-switch statements for types that can't be used in this language's switch statements (already mentioned a few times). Generated code is an interesting case, but I'm focusing on human-written code here. Looking at my current project, the only truly long function not included above (296 LOC/650 LOT) is some Cowboy Code I'm using as an early evaluation the output of a code generator I plan to use in the future. I'll definitely be refactoring it, which removes it from this list.

Many years ago, I was working on some scientific computing software that had a long function in it. The method used a large number of local variables and refactoring the method kept resulting in a measurable difference per profiling. Even a 1% improvement in this section of code saved hours of computation time, so the function stayed long. I've learned a great deal since then, so I can't speak to how I'd handle the situation today.

280Z28 2009-07-14 21:23:28

Answer 14

A:

A few ideas not explicitely mentioned yet:

repetitive tasks, e.g. the function reads a database table with 190 columns and has to output them as a flat file (assuming that columns need to be treated individually, so a simple loop over all columns won't do). Of course you could create 19 functions, each outputting 10 columns, but that wouldn't make the program any better.
complicated, verbose APIs, like Oracle's OCI. When seemingly simple actions require large amounts of code, it's hard to break it down into small functions that make any sense.

ammoQ 2009-07-15 06:48:15

Answer 15

+1 A:

Speed:

Calling a function means pushing to the stack, then jumping, then storing on the stack again, then jumping again. if you use parameters to the function, you usually have several more pushes.

Consider a loop:

for...
   func1

inside a loop, all those pushes, and jumps can be a factor.

This was largely solved with the presentation of Inline Functions on C99 and unofficially before that, But some code written before, or was created with compatibility in mind, may have been long for that reason.

Also Inline has it's flows, some are described on the Inline Functions link.

Edit:

As an example of how a call to a function can make a program slower:

4         static void
5 do_printf()
6 {
7         printf("hi");
8 }
9         int
10 main()
11 {
12         int i=0;
13         for(i=0;i<1000;++i)
14                 do_printf();
15 }

This produces (GCC 4.2.4):

 .
 . 
 jmp    .L4
 .L5:
call do_printf
addl $1, -8(%ebp)
 .L4:
cmpl $999, -8(%ebp)
jle .L5

 .
 .
do_printf:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl $.LC0, (%esp)
call printf
leave
ret

against:

         int
 main()
 {
         int i=0;
         for(i=0;i<1000;++i)
                 printf("hi");
 }

or against:

 4         static inline void __attribute__((always_inline)) //This is GCC specific!
 5 do_printf()
 6 {
 7         printf("hi");
 8 }

Both produce (GCC 4.2.4):

jmp .L2
.L3:
movl $.LC0, (%esp)
call printf
addl $1, -8(%ebp)
.L2:
cmpl $999, -8(%ebp)
jle .L3

Which is faster.

Liran Orevi 2009-07-15 20:38:28

Thanks and Toda Raba. I think that the major problem with inline function is their huje list of arguments, in many cases (no pun intended).

Adam Matan 2009-07-15 21:09:57

ansaurus

tags:

views:

answers:

Why are some functions extremely long? (ideas needed for an academic research!)

related questions