tags:

views:

797

answers:

12

I'm learning a little C over the holiday weekend, and I started to look at other programs written in C. I ended up looking at GNU Netcat, thinking it would be a good example.

I was a bit shocked to see a 600 line main() function. Is this normal? If it is normal is this considered good C coding practices?

+3  A: 

Regardless of the language, I would try and restrict a subroutine method to roughly what's visible in one page of code, and extract functionality to subroutines wherever possible.

600 lines sounds quite long for any implementation. Perhaps there's some overriding reason wrt. passing arguments around and clarity (I've not looked at the example you've posted) but it sounds to be at the far end of what's normally practised, and it should be possible to subdivide this task.

I suspect it's been developed by continual incremental addition of functionality over the years, and nobody has stopped and refactored this to be more readable/maintainable. If there are no unit tests for this (and in my experience main() methods don't often get written tests - for whatever reasons) then there's going to be an understandable reluctance to refactor it.

Brian Agnew
One page is a good practice, which usually tends to be about 25-40 lines on a typical range of monitors.
Goyuix
+19  A: 

There's a quote of an American President (Lincoln?) who was asked how long a man's legs should be. "Long enough to reach from his body to the ground," he said.

Getting back on topic:

Authors of books like "Clean Code" advertise that every function do only one thing (that's grossly simplified by me here), so in theory your main() should maybe call an initialization function and then another function orchestrating the work of the application, and that's all.

In practice, many programmers find a lot of tiny functions irritating. A perhaps more useful metric is that a function should usually fit on one screen, if only to make it easier to see and think about.

If a program is complex and most of its functionality is in main(), someone hasn't done a decent job of breaking the problem down. Essentially you should strive for manageability, understandability and readability. There's usually no good reason for a main() to be huge.

Carl Smotricz
I was just reading Practical C Programming: "A single function should not be longer than two or three pages. If the function gets longer, it can probably be split into two simpler functions. This rule comes about because the human mind can only hold so much in short-term memory. Three pages are about the most that the human mind can wrap itself around in one sitting."
Joseph Kern
Yeah, it'll be hard to get a definite answer that everybody will agree on. But I think most of our sources agree that 600 lines is too long :)
Carl Smotricz
Why is it too long? Browsing at netcat's main(), it does a number of things in a set order with clearly delineated sections. Since there's no repeated code, splitting up the function would merely serve the same purpose as the comments that have been placed before each section of code, except crammed into an identifier instead of a pretty English sentence. I'm only defending netcat's main() to be a devil's advocate... the real point is that it's impossible to apply any length metric globally. Clarity is the foremost concern and it's only sometimes related to length.
Dan Olson
It's too long because it's a major undertaking to wade through those 600 lines and try to understand what they do. There's sample code in another post, and while I can see what it's doing, it's by no means clear-cut. The code is a mess! In a better world, there would be code to scan the input and options, and code to perform the requested actions, and it wouldn't be munged into the knotty mess it is but separated into distinct functions. Why do you think `getopts()` was invented?
Carl Smotricz
Again just playing devil's advocate... don't you think the mess is more a result of the coding style than of specifically the length of the function? If it's about short-term memory and compactness as a way to usher in code clarity, we could accomplish it better with comments and editor folds than by creating a number of smaller functions (thus adding logical indirections). If netcat's actual code were cleaned up we could leave all of the program flow in main() and it'd still be clear. This is the theory at least.
Dan Olson
+2  A: 

By some standards a 600 line function of any sort is a bad idea, but there is no reason main should be treated any differently to any other function.

The only reason I can think of such a situation arising is where a program is developed rapidly, and as it grows no one ever bothers to split it up into more logical units.

Autopulated
+1  A: 

As short as possible. Usually, whenever there's an operation I can assign a name to, I create a new method for it.

Tom R
+3  A: 

Hopefully they are planning on refactoring. This looks very rough.

  443   while (optind < argc) {
  444     const char *get_argv = argv[optind++];
  445     char *q, *parse = strdup(get_argv);
  446     int port_lo = 0, port_hi = 65535;
  447     nc_port_t port_tmp;
  448 
  449     if (!(q = strchr(parse, '-')))    /* simple number? */
  450       q = strchr(parse, ':');  /* try with the other separator */
  451 
  452     if (!q) {
  453       if (netcat_getport(&port_tmp, parse, 0))
  454   netcat_ports_insert(old_flag, port_tmp.num, port_tmp.num);
  455       else
  456   goto got_err;
  457     }
  458     else {     /* could be in the forms: N1-N2, -N2, N1- */
  459       *q++ = 0;
  460       if (*parse) {
  461   if (netcat_getport(&port_tmp, parse, 0))
  462     port_lo = port_tmp.num;
  463   else
  464     goto got_err;
  465       }
  466       if (*q) {
  467   if (netcat_getport(&port_tmp, q, 0))
  468     port_hi = port_tmp.num;
  469   else
  470     goto got_err;
  471       }
  472       if (!*parse && !*q)  /* don't accept the form '-' */
  473   goto got_err;
  474 
  475       netcat_ports_insert(old_flag, port_lo, port_hi);
  476     }
  477 
  478     free(parse);
  479     continue;
  480 
  481  got_err:
  482     free(parse);
  483     ncprint(NCPRINT_ERROR, _("Invalid port specification: %s"), get_argv);
  484     exit(EXIT_FAILURE);
  485   }
Derek Litz
This is a great example of what I would personally put into its own function, or even its own module. Why should I bother the reader with argument parsing code?
Leonardo Herrera
+7  A: 

I often find on certain kinds of applications that main() has hundreds of lines of initialization followed by about 20 lines of top-level loop.

It's my habit not to break functions out until I need to call them twice. This sometimes leads to me writing a 300 line function, but as soon as I see the same block occur twice I break that block out.

As for main, initialization routines are often once so 600 lines does not sound unreasonable.

Joshua
+2  A: 

A 600 line main is a bit of a warning sign. But if you look at it and can't see any way of breaking it up into smaller pieces other than to do this.

void the_first_part_of_main(args...);
void the_second_part_of_main(args...);
...

main()
{
   the_first_part_of_main();
   the_second_part_of_main();
   ...
}

Then you should leave it alone.

John Knoeller
+2  A: 

I would say your routines should be as long/short as necessary to be effective and reliably and automatically tested. A 600-statement routine likely has multiple paths through it and the combinations of routines might get very large very quickly. I try to break down functions into someting that make it easily readable. Functions are either "functional" or "narrative." All the while including unit tests.

No Refunds No Returns
+1  A: 

My personal coding style is to try and only use the the main function for command-line argument parsing, and any big-ticket initialization that the program needs.

Mak Kolybabi
Even those I prefer to break out into routines because even if they are just a simple **if** statement, they will inevitably grow into something bigger.
Goyuix
+2  A: 

Heh, that's horrible but I've seen worse. I've seen large, multi-thousand line fortran programs with no subroutines at all.

I believe the answer is: it should fit in an editor window and it should have low cyclomatic complexity.

If a main program is just a series of function calls or computations, then I suppose it could be as long as necessary and it could have an exemption from the editor window constraint. Even then, I would be a bit surprised that there was not a natural way to extract meaningful discrete methods.

But if it is testing and branching and returning and breaking and continue-ing then it needs to be broken up into individual and individually tested functional components.

DigitalRoss
+1  A: 

Nearly all 600-line functions I have seen were also stupidly written. This doesn't have to be so.

However, in these cases it was just a failure of a programmer to present some zoomed-out view, and give meaningful names to sections - both high-level (like, Initialize()) and low-level (something that takes a common 3-line pattern and hides it under one name, with parameters).

In cases of extreme stupidity, they were optimizing function call performance when it was not required.

Pavel Radzivilovsky
A: 

main(), like any function, should be exactly as big as it needs to be. "As it needs to be" will vary a lot depending on what it needs to do. Having said that, it shouldn't have to be more than a couple of hundred lines in most cases. 600 lines is a bit on the hefty side, and some of that could / should probably be refactored into separate functions.

For an extreme example, one team I was on was tasked with speeding up some code to drive a 3d display. The code was originally written by a wirehead who was obviously taught himself programming using old-school FORTRAN; main() was over five thousand lines of code, with random bits #includeed here and there. Instead of breaking code out into functions, he'd simply branch to a subroutine within main() via goto (somewhere between 13 and 15 gotos, branching both directions seemingly at random). As a first step we simply turned on level 1 optimization; the compiler promptly swallowed up all available memory and swap space and panicked the kernel. The code so brittle that we couldn't make any changes without breaking something. We finally told the customer they had two choices: allow us to rewrite the entire system from scratch or buy faster hardware.

They bought faster hardware.

John Bode