views:

441

answers:

11

Every modern source control system can slice and dice the history of a program. There are many tools to statically and dynamically analyze code. What sort of mathematical formula would allow me to integrate the amount of activity in a file along with the number of deployments of that software? We are finding that even if a program completes all of its unit tests, it requires more work than we would expect at upgrade time. A measure of this type should be possible, but sitting down and thinking about even its units has me stumped.

Update: If something gets sent to a test machine I could see marking it less rotten. If something gets sent to all test boxes I could see it getting a fresh marker. If something goes to production I could give it a nod and reduce its bitrot score. If there is a lot of activity within its files and it never gets sent anywhere I would ding the crap out of it. Don't focus on the code assume that any data I need is at hand.

What kind of commit analysis (commit comments (mentioned below) or time between commits) is fair data to apply?

Update: I think dimensional analysis could probably just be based on age. Relative to that is a little more difficult. Old code is rotten. The average age of each line of code still is simply a measure of time. Does a larger source module rot faster than a smaller, more complex one?

Update Code coverage is measured in lines. Code executed often must by definition be less rotten than code never executed. To accurately measure bitrot you would need coverage analysis to act as a damper.

A: 

Inverse proportion of number of unit tests to the total lines of code?

zvolkov
I think time has to be in there somewhere
ojblass
A: 

Think about two possible measures: edit differences, like Hamming or Wagner distance; and information-theoretic entropy.

Charlie Martin
Do all units tend to cancel out in 'distances'?
ojblass
+2  A: 

I disagree with Charlie: minor refactoring of source code can result in very large Hamming distances, and doesn't provide a good measure of the degree to which the code has been logically modified.

I would consider looking at the length of commit comments. For a given programmer, a relatively long commit comment usually indicates that they've made a significant change to the code.

splicer
Commit comments are too subject to habits such as including incident numbers etc. You have inspired me to add a little more info in the question.
ojblass
+3  A: 

Very interesting train of thought!

First, what is bitrot? The Software Rot article on wikipedia collects a few points:

  • Environment change: changes in the runtime
  • Unused code: changes in the usage patterns
  • Rarely updated code: changes through maintenance
  • Refactoring: a way to stem bitrot

By Moore's Law, delta(CPU)/delta(t) is a constant factor two every 18 to 24 months. Since the environment contains more than the CPU, I would assume that this forms only a very weak lower bound on actual change in the environment. Unit: OPS/$/s, change in Operations Per Second per dollar over time

delta(users)/delta(t) is harder to quantify, but evidence in the frequency of occurrences of the words "Age of Knowledge" in the news, I'd say that users' expectations grow exponentially too. By looking at the development of $/flops basic economy tells us that supply is growing faster than demand, giving Moore's Law as upper bound of user change. I'll use function points ("amount of business functionality an information system provides to a user") as a measure of requirements. Unit: FP/s, change in required Function Points over time

delta(maintenance)/delta(t) depends totally on the organisation and is usually quite high immediately before a release, when quick fixes are pushed through and when integrating big changes. Changes to various measures like SLOC, Cyclomatic Complexity or implemented function points over time can be used as a stand-in here. Another possibility would be bug-churn in the ticketing system, if available. I'll stay with implemented function points over time. Unit = FP/s, change in implemented Function Points over time

delta(refactoring)/delta(t) can be measured as time spent not implementing new features. Unit = 1, time spent refactoring over time

So bitrot would be

             d(env)     d(users)     d(maint)        d(t)
bitrot(t) = -------- * ---------- * ---------- * ----------------
              d(t)        d(t)        d(t)        d(refactoring)

             d(env) * d(users) * d(maint)
          = ------------------------------
                d(t)² * d(refactoring)

with a combined unit of OPS/$/s * FP/s * FP/s = (OPS*FP²) / ($*s³).

This is of course only a very forced pseudo-mathematical notation of what the Wikipedia article already said: bitrot arises from changes in the environment, changes in the users' requirements and changes to the code, while it is mitigated by spending time on refactoring. Every organisation will have to decide for itself how to measure those changes, I only give very general bounds.

David Schmitt
Too late in what sense? Going home... I am going to leave this one open for a while because it is of keen interest to me. I hope you have a great weekend.
ojblass
I have my crackpipe out and it has not helped me quite decipher the final formula...
ojblass
"too late": it was 1 AM in the morning when I finished. I'll add a few more notes at the end, now that I'm fit again.
David Schmitt
Are you implying that more complex code intrinsically is more rotten? Is that in fact the case? I am beginning code rot should be simply have a unit of the average age of each line of code. There is some sort of peotic simple beuty in it.
ojblass
Rotting-as a process-is only defined over time. So complex is-by itself-not rot, but software in complex AND changing environments and with complex AND changing requirements will rot faster than software with less change all around.
David Schmitt
Thinking about the human genome as a massively complex peice of software less critical areas have more lax requirements and can drift over time. What a horrible problem all I want is a demension. This is probably a PhD thesis in and of itself.
ojblass
See also http://en.wikipedia.org/wiki/Entropy_(information_theory) . Which would suggest (bits of entropy/second) as a unit of change over time. This of course only transforms the question to how to measure entropy in a system.
David Schmitt
But that clouds the fact that entropy of a source file is probably not the same as entropy in a physical system.
ojblass
+2  A: 

How about the simplest possible answer?

foreach (file in source control){
  file.RotLevel = (Time.Now - file.LastTestedOrDeployed)
}

If a file hasn't been deployed (either to production or to a test machine) for a long time, it may be out of sync with "reality". The environment may have changed, and even if the file has not been changed, it may no longer work. So that seems to me to be a simple and accurate formula. Why make it more complex than that? Involving number of changes seems to add only uncertainty. If a file has been modified recently, does that mean it has been updated to reflect a change in the environment (which makes it "less rotten"), or have new features been added (increasing the risk of errors, and so making it "more rotten")? Modifications to a file could mean anything.

The only unambiguous factor I can think of is "how long as it been since we last verified that the file worked?"

jalf
But rot = time isn't going to make me sleep better tonight. Maybe you hit it dead on though... Time Since Last Check/Lines of Code. As time tends towards infinity is is rotten. But as lines of codes gets big it makes it less rotten... blech...
ojblass
Its got to be proportianal to both of these quantities. Maybe minutes * number of lines. 0 lines = zero rot. 0 time = 0 rot. Alright it's a start. Now how about adding in info from activity within each module?
ojblass
I don't think activity is a good measure. Like I said, it can mean either more or less rot. (Either it's maintenance, which is good, or it's new features, which is bad (from a rot point of view)). I agree though, the amount of code should probably count as well
jalf
I do have defect or feature attached to the change record.
ojblass
A: 

If you are really interested in digging into this, there is some research out there. I was looking into the concepts from an article that examined the effect of Organizational Structure on Software Quality a while ago. I ended up filing the ideas away in the back of my head, but you might find it enlightening.

D.Shawley
A: 

Since we don't care about code that works, I'd look at the number of changes made to a file (not how big the changes were, just how often a file gets changed) and how many bugs were fixed by those changes plus the number of open bugs which are logged against the file. The result should be a number that gets bigger the more rotten a file is.

Files which change often (config, etc) while not fixing a big will not show up because the bug part of the equation will be low. File with lots of open bugs will show up as will files where changes often lead to new bugs. The changes*bugfixes number should erode over time (because we don't care about old hotspots).

Aaron Digulla
A number... with what dimensions?
ojblass
Since we're not talking about something physical, the number doesn't have to have a "dimension" just like PI has no dimension. Or what dimension would you give to "number of recent changes made to a file"?
Aaron Digulla
number of recent changes to a file has a dimension of changes. 4 changes. That is a unit.
ojblass
Well, in that case, my method doesn't add up since it fails the "dimensions check": I add "changes*bugs" and "bugs" which is wrong (unit mismatch).
Aaron Digulla
A: 

I'm reminded of Evidence Based Scheduling. Come up with a set of reasonable metrics to indicate bitrot (both it's actual value and how much it was reduced by a particular change). Then Determine how accurate they are based on time spent later. Coming up with the numbers and rules for this is probably complicated.

Brian
That was one of the finest off topic articles I have ever read.
ojblass
A: 

My only issue with code coverage and unit tests is that unit tests only test what they were originally designed to test, and they, by definition, are code and prone to the same functional software-rot that plagues regular code. (they are only good for what they are written for, and after a while, that's not enough)

But high quality unit tests will obviously provide some protection.

So these are my important factors for software rot:

  1. Number of external data interface points (extDataIntfPts)
  2. Quality of data/error handling, unit tests (codeQuality)
  3. Dependency on underlying implementations such as OS/VM. (osDep)
  4. Number of external implementation interface point such as plugins. (extIntfPts)
  5. Complexity of code/simple volume of code (linesOfCode)

As a system lives in production, it is exposed to a greater variety of data inputs as the dataset it has collected grows. This by definition exposes the codebase to a greater number of edge cases and sequences.

This can be mitigated by the quality of the data processing, error handling, and unit tests.

There's also the moving targets of the underlying environment that the system operates in. One way to moderate this is to put the application in a VM.

If the system implements plugins, I could see the codebase facing a greater chance of failure as more plugins are developed.

Complex code != elegant code. If it's elegant, it's probably simple. I'm going with the simple point here that the more code there is, the less likely it is that it is well tested, but I suppose it could be turned around.

So, here's my equation:

bitrot=((linesofcode/codeQuality)*(extDataInfPts+(osDep/change)+extIntfPts)*numberOfSecondsDeployed)

Judging codeQuality would probably involve the metric of what the code coverage in the unit tests is. You could run a static analysis program against it to determine potential bugs and that would probably be some help as well. I mean, at some point, it's really hard to score because multi-threading code should be weighted a lot heavier than a POJO. And yes, refactoring ought to be figured in, but only where there are evidences of software rot.

In the end, it's a pseudo-science. Here's my contribution to pseudo-science.

altCognito
+1  A: 

The obvious answer is no. BItrot does not have any accepted dimensions.

ojblass
A: 

Real bitrot (not software rot) has dimensions of physical volume of storage * time.

Bitrot is caused by radioactive decay of impurities in the storage medium.

Joshua
I think you mean bits / second becuase to determine how rotten something is you will have to multiply by time. I dreamed about this for days... but I am better now.
ojblass