views:

1339

answers:

24

35 lines, 55 lines, 100 lines, 300 lines? When you should start to break it apart? I'm asking because I have a function with 60 lines (including comments) and was thinking about breaking it apart.

long_function(){ ... }

into:

small_function_1(){...}
small_function_2(){...}
small_function_3(){...}

The functions are not going to be used outside of the long_function, making smaller functions means more function calls, etc.

When would you break apart a function into smaller ones? Why?

  1. Methods should do only one logical thing (think about functionality)
  2. You should be able to explain the method in a single sentence
  3. It should fit into the height of your display
  4. Avoid unnecessary overhead (comments that point out the obvious...)
  5. Unit testing is easier for small logical functions
  6. Check if part of the function can be reused by other classes or methods
  7. Avoid excessive inter-class coupling
  8. Avoid deeply nested control structures

Thanks everyone for the answers, edit the list and vote for the correct answer I'll choose that one ;)

I am refactoring now with those ideas in mind :)

+1  A: 

This is partly a matter of taste, but how I determine this is I try to keep my functions roughly only as long as will fit on my screen at one time (at a maximum). The reason being that it's easier to understand what's happening if you can see the whole thing at once.

When I code, it's a mix of writing long functions, then refactoring to pull out bits that could be reused by other functions -and- writing small functions that do discrete tasks as I go.

I don't know that there is any right or wrong answer to this (e.g., you may settle on 67 lines as your max, but there may be times when it makes sense to add a few more).

Andrew Hedges
Well I also like to see my complete function in the screen :) sometimes that means a Monospace 9 font and a big resolution in a black background, I agree is easier to understand it that way.
Movaxes
+1  A: 

The main reason I usually break a function up is either because bits and pieces of it are also ingredients in another nearby function I'm writing, so the common parts get factored out. Also, if it's using a lot of fields or properties out of some other class, there's a good chance that the relevant chunk can be lifted out wholesale and if possible moved into the other class.

If you have a block of code with a comment at the top, consider pulling it out into a function, with the function and argument names illustrating its purpose, and reserving the comment for the code's rationale.

Are you sure there are no pieces in there that would be useful elsewhere? What sort of function is it?

Jeffrey Hantin
The function makes a cache file from a template, based on the url, like post_2009_01_01.html from the url /post/2009/01/01 thanks for your answer
Movaxes
+12  A: 

A function should do only one thing. If you are doing many small things in a function, make each small thing a function and call those functions from the long function.

What you really don't want to do is copy and paste every 10 lines of your long function into short functions (as your example suggests).

yjerem
Movaxes
+23  A: 

There's no real hard and fast rules for it. Generally I like my methods to just "do one thing". So if it's grabbing data, then doing something with that data, then writing it to disk then I'd split out the grabbing and writing into separate methods so my "main" method just contains the "doing something".

That "doing something" could still be quite a few lines though, so I'm not sure a number of lines is the right metric to be using :)

Edit: This is a single line of code I mailed around work last week (to prove a point.. it's not something I make a habit of :)) - I certainly wouldn't want 50-60 of these bad boys in my method :D

return level4 != null ? GetResources().Where(r => (r.Level2 == (int)level2) && (r.Level3 == (int)level3) && (r.Level4 == (int)level4)).ToList() : level3 != null ? GetResources().Where(r => (r.Level2 == (int)level2) && (r.Level3 == (int)level3)).ToList() : level2 != null ? GetResources().Where(r => (r.Level2 == (int)level2)).ToList() : GetAllResourceList();
Steven Robbins
I love a good ternary abuse
johnc
LOL Well I could remove all the whitespace in my method and it would only be one very long line and not a long function.Doing one thing, that probably is the answer than,thanks
Movaxes
@Movaxes That code snippet I posted is a single statement though, not just lots of lines on one line.. there's no semi-colons in there :) I could have expanded GetResources() out each time to make it even more evil :P
Steven Robbins
Yeah I noticed :) I love one liners
Movaxes
Yeah that makes sense. Why not just take your entire source file and put it in one line. I mean then you really get to be a Web 2.0 "ninja" :)
BobbyShaftoe
I remember in magazines of old (I'm talking BBC Micro old) they used to have "10 line programs" that just had several statements on each line, up to the max length the BBC could handle.. they were always a right pain to type in :D
Steven Robbins
A: 

I suspect you'll find a lot of answers on this.

I would probably break it up based on the logical tasks that were being performed within the function. If it looks to you that your short story is turning into a novel, I would suggest find and extract distinct steps.

For example, if you have a function that handles some kind of string input and returns a string result, you might break the function up based on the logic to split your string into parts, the logic to add extra characters and the logic to put it all back together again as a formatted result.

In short, whatever makes your code clean and easy to read (whether that's by simply ensuring your function has good commenting or breaking it up) is the best approach.

Phil.Wheeler
thanks for the answer :)
Movaxes
+3  A: 

60 lines is large but not too long for a function. If it fits on one screen in an editor you can see it all at once. It really depends on what the functions is doing.

Why I may break up a function:

  • It is too long
  • It makes the code by breaking it up using meaningful names for the new function
  • The function is not cohesive
  • Parts of the function are useful in themselves.
  • When it is difficult to come up with a meaningful name for the function (It is probably doing too much)
Good points, I agree, also if you have to name the function DoThisAndThisAndAlsoThis probably it does too much.thanks :)
Movaxes
You are just way out of order with this mate. 60 lines will always be too much. I'd say that if you are closing in on 10 lines you are probably close to the limit.
willcodejavaforfood
+2  A: 

In my opinion the answer is: when it does too much things. Your function should perform only the actions you expect from the name of the function itself. Another thing to consider is if you want to reuse some parts of your functions in others; in this case it could be useful to split it.

A: 

assuming that you are doing one thing, the length will depend on:

  • what you are doing
  • what language you are using
  • how many levels of abstraction you need to deal with in the code

60 lines might be too long or it might be just right. i suspect that it may be too long though.

Ray Tayek
I'm doing some caching in PHP, yeah probably 60 lines is too much, refactoring...
Movaxes
+2  A: 

I usually break functions up by the need to place comments describing the next code block. What previously went into the comments now goes into the new function name. This is no hard rule, but (for me) a nice rule of thumb. I like code speaking for itself better than one that needs comments (as I've learned that comments usually lie)

Olaf
I like to comment my code, mostly not for me but for others, that eliminates lot of questions on where $variable was defined, but I also like the code to be self explanatory. Do comments lie?
Movaxes
yes, because more ofthen than not they are not maintained. At the time of writing they might be correct, but once a bugfix or new feature is introduced, nobody forces comments to be changed according to the new situation. Method names tend to lie far less often than comments IMHO
Olaf
I just came across this answer: http://stackoverflow.com/questions/406760/whats-your-most-controversial-programming-opinion/406812#406812 stating that "Most comments in code are in fact a pernicious form of code duplication". Also - Long line of comments there.
Olaf
+3  A: 

Size approx you screen size (so go get a big pivot widescreen and turn it)... :-)

Joke aside, one logical thing per function.

And the positive thing is that unit testing is really much easier to do with small logical functions that do 1 thing. Big functions that do many things are harder to verify!

/Johan

Johan
Good point about the unit testing :)
Movaxes
A: 

One thing (and that thing should be obvious from the function name), but no more than a screenful of code, regardless. And feel free to increase your font size. And if in doubt, refactor it into two or more functions.

gkrogers
+3  A: 

Rule of thumb: If a function contains code blocks that do something, that is somewhat detached from the rest of code, put it in a seperate function. Example:

function build_address_list_for_zip($zip) {

    $query = "SELECT * FROM ADDRESS WHERE zip = $zip";
    $results = perform_query($query);
    $addresses = array();
    while ($address = fetch_query_result($results)) {
        $addresses[] = $address;
    }

    // now create a nice looking list of
    // addresses for the user
    return $html_content;
}

much nicer:

function fetch_addresses_for_zip($zip) {
    $query = "SELECT * FROM ADDRESS WHERE zip = $zip";
    $results = perform_query($query);
    $addresses = array();
    while ($address = fetch_query_result($results)) {
        $addresses[] = $address;
    }
    return $addresses;
}

function build_address_list_for_zip($zip) {

    $addresses = fetch_addresses_for_zip($zip);

    // now create a nice looking list of
    // addresses for the user
    return $html_content;
}

This approach has two advantages:

  1. Whenever you need to fetch addresses for a certain zip code you can use the readily available function.

  2. When you ever need to read the function build_address_list_for_zip() again you know what the first code block is going to do (it fetches addresses for a certain zip code, at least thats what you can derive from the function name). If you would have left the query code inline, you would first need to analyze that code.

[On the other hand (I will deny I told you this, even under torture): If you read a lot about PHP optimization, you could get the idea to keep the number of functions as small a possible, because function call are very, very expensive in PHP. I don't know about that since I never did any benchmarks. If thats case you would probably be better of not following any of the answers to your question if you application is very "performance sensitive" ;-) ]

EricSchaefer
thanks nice example :) I'll google for function benchmarks in php
Movaxes
+32  A: 

Here is a list of red-flags (in no particular order) that could indicate that a function is too long:

  1. Deeply nested control structures: e.g. for-loops 3 levels deep or even just 2 levels deep with nested if-statements that have complex conditions.

  2. Too many state-defining parameters: By state-defining parameter, I mean a function parameter that guarantees a particular execution path through the function. Get too many of these type of parameters and you have a combinatorial explosion of execution paths (this usually happens in tandem with #1).

  3. Logic that is duplicated in other methods: poor code re-use is a huge contributor to monolithic procedural code. A lot of such logic duplication can be very subtle, but once re-factored, the end result can be a far more elegant design.

  4. Excessive inter-class coupling: this lack of proper encapsulation results in functions being concerned with intimate characteristics of other classes, hence lengthening them.

  5. Unnecessary overhead: Comments that point out the obvious, deeply nested classes, superfluous getters and setters for private nested class variables, and unusually long function/variable names can all create syntactic noise within related functions that will ultimately increase their length.

  6. Your massive developer-grade display isn't big enough to display it: Actually, displays of today are big enough that a function that is anywhere close to its height is probably way too long. But, if it is larger, this is a smoking gun that something is wrong.

  7. You can't immediately determine the function's purpose: Furthermore, once you actually do determine its purpose, if you can't summarize this purpose in a single sentence or happen to have a tremendous headache, this should be a clue.

In conclusion, monolithic functions can have far-reaching consequences and are often a symptom of major design deficiencies. Whenever I encounter code that is an absolute joy to read, it's elegance is immediately apparent. And guess what: the functions are often very short in length.

Ryan Delucchi
Nice post! Very deterministic
Chuck Conway
wow nice post, thanks for the answer :) I am checking my code for each one of those
Movaxes
Good post! IMHO, the right size is below 80x25 with 8-space tabs. Another criterion (8): there are repeating *patterns* in the code. This may be reduced to (3).
jetxee
Jetxee: it is funny you mention that as I almost did. But I dropped it since it is a bit redundant with the other items and probably extends a bit too much into the realm of general architectural design. Then again, this is another reason that long functions is often linked to poor architecture.
Ryan Delucchi
+3  A: 

My personal heuristic is that it's too long if I can't see the whole thing without scrolling.

Ferruccio
... while have set the font size to 5?
EricSchaefer
+2  A: 

I agree a function should only do one thing, but at what level is that one thing.

If your 60 lines is accomplishing one thing (from your programs perspective) and the pieces that make up those 60 lines aren't going to be used by anything else then 60 lines is fine.

There is no real benefit to breaking it up, unless you can break it up into concrete pieces that stand on their own. The metric to use is functionality and not lines of code.

I have worked on many programs where the authors took the only one thing to an extreme level and all it ended up doing was to make it look like someone took a grenade to a function/method and blew it up into dozens of unconnected pieces that are hard to follow.

When pulling out pieces of that function you also need to consider if you will be adding any unnecessary overhead and avoid passing large amounts of data.

I believe the key point is to look for reuseability in that long function and pull those parts out. What you are left with is the function, whether it is 10, 20, or 60 lines long.

bruceatk
A: 

Bear in mind that you can end up re-factoring just for re-factoring's sake, potentially making the code more unreadable than it was in the first place.

A former colleague of mine had a bizarre rule that a function/method must only contain 4 lines of code! He tried to stick to this so rigidly that his method names often became repetitive & meaningless plus the calls became deeply nested & confusing.

So my own mantra has become: if you can't think of a decent function/method name for the chunk of code you are re-factoring, don't bother.

+3  A: 

I think there is a huge caveat to the "do only one thing" mantra on this page. Sometimes doing one thing juggles lots of variables. Don't break up a long function into a bunch of smaller functions if the smaller functions end up having long parameter lists. Doing that just turns a single function into a set of highly coupled functions with no real individual value.

jmucchiello
+1  A: 

There has been some thorough research done on this very topic, if you want the fewest bugs, your code shouldn't be too long. But it also shouldn't be too short.

I disagree that a method should fit on your display in one, but if you are scrolling down by more than a page then the method is too long.

See The Optimal Class Size for Object-Oriented Software for further discussion.

Ian Hickman
thanks for the link, reading :)
Movaxes
A: 

I have written 500 line functions before, however these were just big switch statements for decoding and responding to messages. When the code for a single message got more complex than a single if-then-else, I extracted it out.

In essence, although the function was 500 lines, the independently maintained regions averaged 5 lines.

Joshua
+2  A: 

Take a peek at McCabe's cyclomatic, in which he breaks up his code into a graph where, "Each node in the graph corresponds to a block of code in the program where the flow is sequential and the arcs correspond to branches taken in the program."

Now imagine your code has no functions/methods; its just one huge sprawl of code in the form of a graph.

You want to break this sprawl into methods. Consider that, when you do, there will be a certain number of blocks in each method. Only one block of each method will be visible to all other methods: the first block (we're presuming that you will be able to jump into a method at only one point: the first block). All the other blocks in each method will be information hidden within that method, but each block within a method may potentially jump to any other block within that method.

To determine what size your methods should be in terms of number of blocks per method, one question you might ask yourself is: how many methods should I have to minimise the maximum potential number of dependencies (MPE) between all blocks?

That answer is given by an equation. If r is the number of methods that minimises the MPE of the system, and n is the number of blocks in the system, then the equation is: r = sqrt(n)

And it can be shown that this gives the number of blocks per method to be, also, sqrt(n).

+1  A: 

I normally uses a test driven approach to writing code. In this approach the function size is often related to the granularity of your tests.

If your test is sufficiently focused then it will lead you to write a small focused function to make the test pass.

This also works in the other direction. Functions need to be small enough to test effectively. So when working with legacy code I often find that I break down larger functions in-order to test the different parts of them.

I usually ask myself "what is the responsibility of this function" and if I can't state the responsibility in a clear concise sentence, and then translate that into a small focused test, I wonder if the function is too big.

flamingpenguin
A: 

Extending the spirit of a tweet from Uncle Bob a while back, you know a function is getting too long when you feel the need to put a blank line between two lines of code. The idea being that if you need a blank line to separate the code, its responsibility and scope are separating at that point.

MarkB
A: 

If it has more than three branches, generally this means that a function or method should be broken apart, to encapsulate the branching logic in different methods. Each for loop, if statement, etc is then not seen as a branch in the calling method. Cobertura for Java code (and I'm sure there are other tools for other languages) calculates the number of if, etc. in a function for each function and sums it for the "average cyclomatic complexity". If a function/method only has three branches, it will get a three on that metric, which is very good. Sometimes it is difficult to follow this guideline, namely for validating user input, nevertheless putting branches in different methods aids not only development and maintenance but testing as well, since the inputs to the methods that perform the branching can be analyzed easily to see what inputs need to be added to the test cases in order to cover the branches that were not covered - if all the branches were inside a single method, the inputs would have to be tracked since the start of the method, which hinders testability.

MetroidFan2002
A: 

My idea is that if I have to ask myself if it is too long, it is probably too long. It helps making smaller functions, in this area, because it could help later in the application's life cycle.

sokket