tags:

views:

316

answers:

8

Are there any broad, overgeneralized and mostly useless rules about how long it will take to understand a program based on the number of LOC (lines of code)?

(I understand any rules will be broad, overgeneralized and mostly useless. That's fine.)

(The language in question is Delphi, but that shouldn't matter because I'm looking for broad, overgeneralized and mostly useless rules.)

+3  A: 

You cannot google this because there will be a different approximate number for each individual person programming in a specific language.

You are trying to write the Drake's Equation for program writing.

This is what I mean.

About program writers.

  • each person has a different style of writing and commenting code
  • every programming language has different nuances and readability
  • algorithms can be implemented in many ways even in the same language
  • data structures used by different people tend to be quite varied
  • the decision of how code is distributed over source files also changes with personal taste

Moving to the person reading the code.

  • the familiarity of the person with the language matters
  • familiarity to the algorithms and data structure patterns used matters
  • amount of information context that the person can retain at a time matters

Shifting focus to the environment, things that matter would be.

  • the amount of distraction (both for the programmer and the person trying to read the program)
  • nearness to code release time for the programmer
  • pending activities and motivation on the part of the reader
  • proximity of popular events (vacations, sports events, movie release dates!)
nik
Even obfuscated C code can be understood at a rate of 1 line per day, I'd imagine. So the range of possible values is not nearly as broad as Drake's equation.
Daniel Straight
What I am trying to point out Danial is, the range of possible `parameters` is difficult to identify and generalize. So, while we can give worst-case estimates of the sort day-per-LOC there will not be any usable formula that gives you a number (besides 42, that is).
nik
Having said that, I admittedly exaggerated the mapping to Drake's equation. Though, later I was a bit dubious about which was broader. I largely agreed with myself about exaggerating finally.
nik
Except the range is only as broad as 1 to number_of_lines_you_can_physically_read, and most of those values make no sense. Even the most complicated programs not specifically designed to be obfuscated should be understandable at a rate of 1 line per hour. Even the simplest program (aside from 1000000 print statements) is unlikely to be understood at a rate greater than 3600 lines per hour. All I'm asking is between 1-3600, given average circumstances, what's a sensible estimate? The 100-400 estimate given by Hosam Aly seems to do it. That's usable. 42 isn't.
Daniel Straight
@Daniel, Well sometimes you have to take 42 with some orders of magnitude. In other words, statistics should not be misused. I am looking at the accepted answer and I see no citation around it. Would you have taken my answer if it said "my professor said it takes about 420 loc/hour to understand any program, broadly speaking, regardless the language, etc"?
nik
+5  A: 

It's not the number of LOC that determines how long it takes to understand a program, it's more the complexity.

If my program had 100,000 lines of print statements, I think the program is pretty clear to understand. However if I had a program with for-loops nested ten deep, I think that will take far longer to understand.

Cyclomatic complexity can give a ROUGH indication of how hard the code is to understand, and can signal some other warning flags as well about your code.

AlbertoPL
Agreed. However, without numbers, this is not helpful. SourceMonitor (metrics program) reports an average complexity of 7.45. So I know it's simpler than a program with complexity 10 and more complicated than one with complexity 5... but that doesn't tell me anything without a reference point.
Daniel Straight
+1  A: 

I'm looking for broad, overgeneralized and mostly useless rules.

Sounds to me like you're just trying to find a way to estimate time it will take to learn a new codebase to management or something. In that case, find a code snippet online, and time how long it takes you to understand it. Divide that by the number of lines in the snippet. Add some padding. Bam! There's your rule.

Welbog
I'm not sure the time varies linearly, or even nearly linearly, with the size of program.
ChrisW
So add more samples and plot the times to see what kind of curve it follows. Might be linear, might be polynomial, might be exponential. Honestly, I bet it's logarithmic: an additional line of code means a lot more when there are only 5 lines than when there are 5000.
Welbog
It's possible there's no correlation at all: for example, that the time depends on how well the program is structured and not on how many lines of code it contains.
ChrisW
Could be. My point is you'll never know unless you figure it out some sample times on snippets of various sizes and complexities and analyse the results statistically.
Welbog
+2  A: 

I have the theory that it's O(n2) (because you have to understand each line in conjunction with every other line).

But, as usual when using big-o notation to get an actual numeric value, this answer is broad, overgeneralized and mostly useless.

balpha
true if your program has zero modularization.
flybywire
True anyway: I'm just saying it's *bounded above* by n^2. O(n) is also O(n^2).
balpha
+1  A: 

Code review metrics (which is not the same thing, but nearly comparable) put the number in the range of approximately 50-100 LoC per hour, for an experienced code reviewer.

This of course also depends on what they're looking for in the review, language, complexity, familiarity, etc.... But that might give you a general overgeneralization anyway.

AviD
I think this is somewhat different from the original question. A code reviewer reviews code critically, while a reader reviews code to understand it. I believe the reader has to be significantly faster than the reviewer.
Hosam Aly
Yes, I agree - I even pointed that out, however it is an actual number for something *similar*, and as he requested its a meaningless number to give him something to base it on.
AviD
A: 

Apart from "how complicated is the program?", other variables include things like "how well do you understand it?" and "how well do you understand other things, such as the program's functional specification?"

When I start to work with a new program, I try to understand as little of it as possible! Specifically I try to:

  • Understand the functional specification of the change that someone wants me to make (if nobody wanted me to change the program then I wouldn't need to understand it at all)

  • Find and understand the smallest possible subset of the existing program, which will let me make that change without breaking any other, previous/existing functionality.

ChrisW
+1  A: 

Look at the COCOMO equations. They contain broad, overgeneralized and mostly useless rules based on Source Lines of Code.

S.Lott
+3  A: 

Some papers concerning peer code review say that it should be somewhere between 100 and 400 lines of code per hour.

Hosam Aly
I like it. This gives ballpark figures of understanding the code between 3.5 and 14 work weeks. I would say the definitely fits the description of broad, overgeneralized and mostly useless. Thank you!
Daniel Straight
whaa? I'm familiar with research citing 50-100 loc per hour, who got up to 400?
AviD
On second thought, I'm talking about security code reviews, and you probably are not... so that might be a substantial difference.
AviD
By "should" I meant that the rate is expected to be between 100 and 400 LOC/hour. I didn't mean "should" as in "you should do it this way." :)
Hosam Aly
hehe, but I meant that most experienced reviewers aren't expected to reach 400 loc/h. Again, these numbers are probably not applicable to security CR, which is what I'm familiar with... Regardless, its a meaningless number that the OP can base on and extrapolate to whatever he needs...
AviD