views:

3584

answers:

28

One thing that occasionally drives me crazy is reading another person's functions that span 5 vertical monitor lengths, or .cpp files that are over 2000 lines long. For readability, wouldn't it be better to break a 1000 line function into many smaller sub-functions called by one function? Shouldn't a class implementation not span an inordinately huge number of lines? When should you start breaking functionality up into sub-classes or utility classes?

Is it unreasonable for me to be so put off by excessively large files/functions? And, if I am not in the wrong, how should I approach a colleague to convince them to refactor their code?

Edit: There are a lot of good answers to this question, and I should read up on the Code Complete chapter. As for convincing my colleague(s), it seems like a bit too expensive to get them to refactor existing production code, but I'd settle for all their future code being well refactored.

+5  A: 

No, it's not unreasonable of you. I'm the same way. Unfortunately, there are people with a fetish for omnibus classes that do everything except wash your dishes, and are thousands of lines long (usually management types).

Don't know that there's much you can do about it. In my experience, most programmers willing to create monstrosities like that are usually the type who don't even know what "refactor" means.

DannySmurf
A: 

There's a section in Code Complete that talks about function lengths, and I think 200 comes up as the magic number, but there are some conditionals, I believe. I don't have CC next to me to check, which is highly unusual.

Thomas Owens
+70  A: 

Easy. If the method does more than one definable thing, it's too long. ;-)

Cheekysoft
What is you definition of one thing? Is this an atomic operation or a logical operation?
benPearce
I guess its one *conceptual* thing: a group of operations that achieve one clear definable objective. Obviously, as you go up the call stack the operational scope of each method becomes larger.
Cheekysoft
So how would you determine if the method does more than 1 thing especially further up the call stack.
daub815
@daub815, what do you mean? You simply think to yourself, "Is this method doing one, conceptual, task?" Are you sorting ~and~ connecting to a database ~and~ calculating something? Then you are doing 3 things.
Simucal
I was bever a fan of this one.. what if my company has OperationA, OperationB and OperationC, but they have to be done IN ORDER? Sure I can make my code a little easier to maintain by breaking them into 3 seperate fucntions, and call them all from another... but then that function does all 3...
Neil N
@Neil but each individual one is done one thing and the "master" method is doing one thing as well. "Read a book and write a report" you first need to read it (method 1) the write the report (method 2). The ReadAndWrite method is still a single conceptual thing though,
TofuBeer
It's all about making calls within your method at the appropriate level of abstraction. createEmployeeList() { dbConnecct(); dbGrabData(); dbSort(); )} does "1 thing" and only makes calls to functions one level below its own lvl of abstrctn. Inlining those methods crosses too many layers of abstrctn
Cheekysoft
That is meaningless
Mo Flanagan
It absolutely is one _conceptual_ thing, and what constitutes a "one conceptual thing" is a _judgment call_ made in a certain context by a skilled person. That's the best we can hope for. It's the same with the single responsibility principle. It's ultimately a judgment call.
Gregory Higley
A: 

I think the "textbook" answer for this question is that if you can't see the entire function on one screen, the function is too long. For some complicated algorithms, it might be OK to have a really long function.

For class sizes, I think maybe a better metric is how many public methods the class exposes. I would probably prefer a class that is 5000 lines long, but provides a single, well defined service, over a class that is 500 lines long but has tons of getters and setters.

Outlaw Programmer
+51  A: 

Well, to play Devil's Advocate:

Is a 1000 Line function better or worse than 1000 1-Line function that will result in 4000 lines of code if you keep in mind the extra overhead for function definition?

Generally, functions should be small. Someone - i forgot who - put up some coding conventions, and one of the tips was: The function name should define everything the function does. GetUserName(), IncrementPostCount() etc. Not the how - just the "What", i.e. What the result of the function would be, but that also applies to private functions.

Now, if you have trouble naming your function or if you end up with "AddOrDeleteOrUpdateOrMoveOrPromoteUser()", then that's a sure sign: Your function is too long. Also, if you have uber-generic stuff like ManageDatabase() or CrashWallStreet(), that's a sure sign the funtion should be re-factored.

A 1000 Line function is most likely too long, but there are always situation where you cannot split a function, because all the code essentially belongs to one task, and that task is specified in the function name.

But I found that "If you cannot give a proper name to your function, it should be refactored" rule really good.

Michael Stum
To play Devil's Advocate as well: "Also, if you have uber-generic stuff like ManageDatabase() or CrashWallStreet(), that's a sure sign the funtion should be re-factored." Does this mean having an `int main()` is bad? :P
@user470379 Well, I would have called it EntryPoint() since that clearly describes what it does and since thats what it's called in .net internally :P
Michael Stum
+2  A: 

Ditto what Cheekysoft said - basically, a method/functions needs to have good cohesion. If it is doing just one thing, it will be short by nature.

Refactoring a long method into a bunch of short ones is only part of the idea - the short ones should do more than just divide the work, they should do what they do well and only do that one thing.

Jason Bunting
+13  A: 
Michał Piaskowski
+1  A: 

I guess depending on what you're trying to do should most dictate how long something is going to be. It sounds like this person is unnecessarily making things longer. For starters, showing them HOW to refactor their code would be the best start. "Here, you took 100 lines to do this, but doing it this way, it can be done in 20" or "Using this built in function, you can eliminate the need to do this" or whatever. Code reviews can also be helpful and then you put your work up for constructive criticism, too, so it's not one-sided.

On the other hand, as Joel mentioned in one of the StackOverflow podcasts, sometimes you have to ask yourself from a business standpoint if it's really worth optimizing or not. We'll often write bad or long code and plan to make it better, but decide against because "it's working" and there's other work to be done that's not working. Can you justify spending the time to make something better? For this case, if he's constantly writing too much code, maybe the answer is 'yes'.

Jason Shoulders
+13  A: 

Actually, I think code complete says "more than 1 screen is too long" (for methods at least)

Juan Manuel
What if you always use a very big screen?
dlamblin
There were no such thing as very long screens when Code Complete was written!
MrGumbe
+3  A: 

How to test if lines of code is too great:

  • It breaks the compiler.
  • It sets your brain on fire to read it.

Everything else is subject to interpretation.

MSN

Mat Noguchi
+4  A: 

It should be refactored if...

  • It does more than one thing.
  • It does that one thing more than once, outside of a loop.
  • If you forget the first line, by the time you read the last line.
Brad Gilbert
+2  A: 

I once had the opportunity to help refactor 3 C++ files each with about 10,000 lines into not just separate files, but 3 separate and modules. Each original file contained both C++ class declarations and implementations.

I've also worked with the opposite extreme where a module consisted of hundreds of files and every tiny little C++ class is placed in 3 files: .h, .cpp, and .inl with many of these files containing just a couple of lines or just boilerplate code.

I don't know which extreme was more difficult to maintain but I do know the happy medium is far better.

Brian Ensink
A: 

At best the whole function should fit into screen (not so hard nowadays :) ).

Also it should do one thing and doesn't duplicate any code.

However, there are exception -- function which try to solve all the problems with i.e. many optional parameters is bad.

Grzegorz Gierlik
+5  A: 

Easy. If the method does more than one definable thing, it's too long. ;-)

I actually couldn't disagree more. you could very easily have a method which does one defineable thing, saveTheWorld(), which has many many sub operations, which are in code directly in that method rather than calling endWorldHunger(food), endPolution(), etc. Bad example, but I think you get my point. One definable thing is not a good metric for if a method is too long.

shsteimer
One definable thing within a reasonable level of abstraction. I doubt you'll ever find an objective way to describe "too many lines in a routine".
Arve Systad
A: 

When the diff program stops working on your source files because they exceed 65536 lines of code, that's too many...

OJW
+1  A: 

My rule of thumb: If the function code doesn't fit on one page of print out then it might be time for another function.

jussij
+3  A: 

Some places have very rigid rules like no line more than 40 chars, no function more than 200 lines, but blindly enforcing these kinds of rules is a dangerous thing, and you end up with code that is designed around the style rules rather than common sense.

How about "you'll know it when you see it", if anyone gets the reference :)

seanb
+1  A: 

For me, it kind of depends on the language. In Smalltalk, it is painfree to write small methods. You can write a long one and have parts factored out by OmniBrowser. The overhead per method is just one line. And Smalltalk is not file-based, so instead of getting scrolling nightmares, code becomes easier to find. Coming to some numbers, my smalltalk-methods are usually, and I think these are very usual Smalltalk-numbers, 4-8 lines long, auto-formatted. If you don't do Smalltalk: The auto-formatter breaks about everything into a newline.

In Java, it is not so tempting. First, the overhead per method is not just a little larger. Then, you need to think of an order of your functions, which is far from trivial for me. Mostly, it's something I hate thinking about. Most colleagues like my code, although they generally disagree with my sorting of methods. I wish Eclipse could just sort them alphabetically or something.

So, my Java-methods are longer. 4-8 lines of manually formatted code. That is a lot more, as my Java lines are really crowded, compared to my Smalltalk code.

And I don't buy this "it just belongs together" thing. If a method requires more than 8 lines, you can always group whatever is going on there and give the subparts names.

I am grateful that at my work place, people give me a lot of freedom, as I am clearly the only one with a Smalltalk background there and the only one writing code in this manner.

niko

nes1983
Eclipse can sort methods alphabetically, even on more than one file at a time: Either via Source->Sort Members, or by setting up a "Clean Up" profile that includes this task (and configuring it as a save action).
mihi
+2  A: 

Functions are not a tool to improve readability. That may be a convenient side-effect, but it is not their purpose in code. If there is a functional reason why the function should be split then by all means split it up or encourage your colleague to do the same. If its for readability only that you want to split it, then what you really need are comments and descriptive names to help you mentally step through the code more easily. If code is readable it shouldn't matter how long it gets... using a scroll bar isn't hard, it's just keeping track of variables and such that makes it difficult (at least for me).

jheriko
+12  A: 
leander
+1  A: 

I don't think its a matter of how many lines of code you have versus having someone else siphering thru your code to find something or solve problems. I proper OOP practices have been followed then I wouldn't care how many lines of code you had. It's the redundancy that kills me...

Byrdsong
+3  A: 

My general rule of thumb is that any block is a candidate for a method (this is done on a case-by-case basis, not every block should be a method).

Consider this bit of Java (not compiled incase there are any small issues :-):

void foo(File[] files)
{
    for(final File file : files)
    {
        // open the file
        // process the file
        // close the file
    }
}

I would instead do this:

void foo(File[] files)
{
    for(final File file : files)
    {
        process(file);
    }
}

void process(final File file)
{
    // open the file
    // process the file
    // close the file
}

This leads to easier to test code - you can test the "process" method in isolation without having to provide an array of files. It also tends to make each method "atomic" (doing a single conceptual task).

In practice my methods are rarely over 20 lines long (probably rarely over 10 lines). I also use the Cyclomatic Complexity metric to keep the methods simpler (and thus shorter).

TofuBeer
A: 

It depends. I'm totally with you on trying to keep script lengths reasonable, but have seen SQL statements of more than 100-200 lines written with sound technique; sometimes you have a lot of joins and conditions to include.

If you had asked what's worse: a 300 line function that was well formatted, sensibly constructed, and logical, or a 150 line function that was full of hard-coded values, needless looping structures, poorly named variables, serpentine GOTOs, no formatting, double negative logic, a lack of short circuit operators, and tons of ELSEIFs rather than case statements, well, that would be easier to answer.

Bernard Dy
A: 

Segmentation is key, it is part of modern OOP design.

I was recently talking to a fellow coder who bragged about having code 12 curly braces deep, and all I could think was that it seemed like a pointless exercise.

I am totally with you.

PiPeep
A: 

Most people have a short term memory that can cope with about 7 items. Therefore you methods should be able to be understood by thinking about less then 7 items.

A line of code may be more then 1 item, or many lines of code may be a single item. E.g.

foreach(person in people)
{
}

Is one item

And

for(int i = 0; i < a.length; i++)

Is one item, as most programmer now what it means

But

for(i = 2; i < a.length - 4; w(ref i))

Is many items as it takes a lot of thinking to know what it does

Also a method should do a single thing, if you need to write a comment to split up the code, then put the code in two methods and let the name of the method say what the code does.

(See the Clean Code book for lots of examples of good coding, including writing short methods)


It is clear that line of code is not a good measure of how complex a method is, so I have used the ill-defined concept of “items”. A more scientific metric would include Cyclomatic Complexity etc; however I like have a simple metric that any programmer can work out in his head.

At the end of the day we are looking for.

  • Can you (an experienced programmer) understand what (and why) this method does without having to think
  • Can you (an experienced programmer) understand how this method works with only a few minutes of thinking.
Ian Ringrose
A: 

When I did C++ I ran into longer code like that. The thing is I ran into it in production systems that served millions+ of customers quite well for a lot of profit. I wasn't really in a position to say "you shouldn't do that".

One approach may be to try and learn the function, and ask lots of question. And turn the criticism into praise. "You know around line 520 in FooFactorizer, what's with the inner accumulator list? How does that work". "Oh hey, that accumulator factorizer at 520 is pretty sweet; it could almost be its own function".

Frank Schwieterman
A: 

The function should do one, precise thing. The line of code in a method don't necessarily relate to that. For example, if you create a method called UploadPhotos, it should ONLY upload photos, nothing more. Your method names should also be one single thing you can do- not something like UploadPhotosandAddFilter. I mean, if it takes you 1000 lines in your method to achieve UploadPhotos (in which case you should really think about your implementation or language of choice), then so be it. Just create separate methods for separate tasks.

DMan
A: 

Pragmatic Programmer says: Good is good enough.