views:

876

answers:

25

How would one go about proving to management that a batch reformat of all .java files in a large code base (to place the code in compliance with the company's coding standards) is safe and will not affect functionality.

The answers would have to appease the non-technical and the technical alike.

Edit: 03/12/2010Clarification for the technical among you; reformat = white space onlychanges - no "organizing imports" or "reordering of member variables, methods etc."

Edit: 03/12/2010 Thank you for the numerous responses. I am a surprised that so many of the readers have voted up mrjoltcola's response since it is simply a statement about about being paranoid and in no way proposes an answer to my question. More over, there is even a comment by the same contributor reiterating the question. WizzardOfOdds seconded this viewpoint (but you may not have read all the comments to see it). -jtsampson

Edit: 03/12/2010 I will post my own answer soon, though John Skeet's answer was right on the money with the MD5 suggestion (note -g:none to turn debugging off). Though it only covered the technical aspects. -jtsampson

03/15/2010 I added my own answer below. In response to what does "safe" mean, I meant that the functionality of the java code would not be affected. A simple study of tyhe java compiler shows this to be the case (with a few caveats). Thos caveats were "white space only" and were pointed out by several posters. However this is not something you want to try to explain to BizOps. My aim was to elicit "how to justify doing this" type of answers and I got several great responses.

Several people mentioned source control and the "fun" that goes along with it. I specifically did not mention that as that situation is already well understood (within my context). Beware the "gas station" effect. See my answer below.

+32  A: 

If you don't know how to convince them, how do you know it is safe?

Parsers, reformatters, pretty printers are just software, they may have bugs. Without unit tests everywhere, with a large codebase, you may not be able to verify 100% that the end result is identical.

As a developer, I'm paranoid, and the idea makes me uneasy.

Ponder this: Management is now aware that you are mucking around. A previously undiscovered bug gets reported after your reformat. You are now chief suspect. Whether it is "safe" has multiple meanings. It might not be safe for you and your job.

EDIT: In response to jt's edits. Your question wasn't about how to do it, it was "How to convince management that it is safe". Perhaps you should have asked, instead, "Is it safe? If so, how to do it, safely." My statement was pointing out the irony of your question, in that you assumed it was safe, without knowing how. I appreciate the technical side of reformatting, but I am pointing out that there is risk involved in anything non-trivial and unless you put the right person on it, it might get mucked up. Will this task detract from that programmers other tasks, sidetracking them for a couple of days? Will it conflict with some other coder's uncommitted revisions? Is the source under revision at all? Is there any embedded script that is whitespace sensitive, such as Python or Groovy? etc. etc. The task is obviously non-trivial because you desire to do it with a tool rather than have a programmer do it, but on the other hand, a programmer, hand-formatting code can screw it up too. Hence my distaste for mass-reformatting, by hand or automated.

mrjoltcola
+1: If you have enough testing to convince **you** that it's safe, you can probably convince others.
S.Lott
Well, my answer is a little generic. Since the OP says he only has .java files, then the MD5/SHA sum of each .class would be sufficient for us. Then our job is to explain to management how hashing works.
mrjoltcola
except adding whitespace affects the MD5/SHA sum.
Chris Kaminski
+1 for the last statement.
Developer Art
Adding whitespace to the .java file should not affect the checksum of the .class file.
mrjoltcola
+1 for the "Ponder this:" paragraph. The correct thing to do would have been to refactor a page at a time behind their backs. Management doesn't know or care what the code looks like. If it truly is safe they would never know the difference. Better still, they would never be aware that you were spending time (and therefore money) making changes that do not directly improve the bottom line
rotard
@mrjoltcola it may change the debug information in the class files (line numbers), which will change the checksum.
josefx
@josefx: In a class compiled for release mode?
mrjoltcola
@mrjoltcola the default is to compile with debug information and the question doesn't state that he uses -g:none. I don't know the management, but if they fear just formatting code alone what will they say about removing parts of the class files? There are after all languages where changing from debug to release mode can "cause" issues (concurrent c/c++ programs sometimes have this problem)
josefx
mrjoltcola, thanks for your response. However your initial answer really doesn't answer my question - it is more of a statement. But following the comments, it seems you agree with Jon Skeets recommendation below for the technical approach (not sure who posted the md5 idea first).
jtsampson
My statement is referring to the inherent risk of assuming just because something is possible, and even technically practical doesn't mean it is safe. Put the wrong person on the job, and it might not get done in a safe manner. The very act of undertaking a sensitive operation implies risk that could be avoided completely; if the operation isn't critical or profitable, don't do it. Reformatting code is not a critical need for me. But yes, I agree that checksumming can work, in a controlled situation.
mrjoltcola
+8  A: 

I would use four words.

Source control. Unit Tests.

Paddyslacker
+1: If it passes the same tests before and after, that sounds safe to me.
S.Lott
PaddySLacker, I (as a developer would agree). But as a QA business person there is little faith that unit tests cover all conditions. However since we have such suites already that will also be apart of the technical (empirical) approach for proof.
jtsampson
@jtsampson: As a developer, I have no faith that unit tests cover all conditions. I have complete faith in the directly contradictory proposition. Unit tests never cover every possibility.
David Thornley
@David Thornley I agree that for non trivial code, unit tests (or any tests) never cover all cases, even with 100% code coverage. However, if you have enough unit tests, you can be reasonably sure that the behaviour of the code hasn't changed before and after the cleaning of the code. Unless it's planned for this codebase to never be modified in any way in the future, then this kind of change is really not a big deal, and way less risky than adding new features will be in the future.
Paddyslacker
@Paddyslacker: If you make reasonable changes, and the unit tests still pass, you can be pretty confident that you didn't break anything. I'm less happy about using them as confirmation for large automatic changes.
David Thornley
+1  A: 

If you are using Eclipse as your development platform, you can load all the code into the workspace locally. Demonstrate to management there are no problems by showing them the Problems tab.

Then, right click and Format each of the projects one by one - again demonstrating no problems are introduced.

You can do this on your local workstation without any harm at all to your repository.

Honestly if your management is so non technical as to be afraid of formatting source code, then demonstrating that no problems appear on the problems tab after a format should be sufficient to show that the code is still fine.

Not to mention you will presumably have the old version tagged in source control right?

Brian Teeter
The "problems" tab only shows code that fails to compile - it won't show any subtle behaviour changes that may be introduced.
Nicholas White
A: 

show them ur regression test results

Paul Creasey
ths s nt cll phn
Adriano Varoli Piazza
@Adriano: Actually, there are users who reply from cell phones and I have to confess to having checked stackoverflow from my 'Droid.
Larry Lustig
Although, when I do it, I spell out the words anyway.
Larry Lustig
The fact that Adriano's comment got 5 up votes is sad, and i was in fact on my phone.
Paul Creasey
The source is irrelevant; it's the destination that matters. The reason why @Adriano's comment is sad is not that it possibly insults you -- it's that it is just as bad as your use of *ur*.
Austin Salonen
Actually the source, and it's crappy keyboard is entirely relevant, it is the reason such abbreviations exist. While I agree that such language is better avoided, it in no way warrants such a response from what i can only describe as a troll. By the way, using @ in the middle of the sentence like that is innapropriate since the comment is directed at me not Adriano, see i can be pedantic too, look at me.
Paul Creasey
I apologise then, I had no idea that SO could be accessed from a cellphone. Still, to me it's a matter of respect. I don't care where you write from, take your time and spell correctly.
Adriano Varoli Piazza
A: 

A school of thought could be to do it without asking and then be able to go "See!"

Of course if you wreck it all up then you'll get fired. You makes your choices...

Alternatively, source control (or simple backups) then you can always roll it back.

Smalltown2000
+35  A: 

If it's just reformatting, then that shouldn't changed the compiler output. Take a hash (MD5 should be good enough) of the build before and after the reformatting - if it's the same for every file, that clearly means it can't have altered behaviour. There's no need to run tests etc - if the output is byte for byte the same, it's hard to see how the tests would start failing. (Of course it might help to run the tests just for the show of it, but they're not going to prove anything that the identical binaries won't.)

EDIT: As pointed out in comments, the binaries contain line numbers. Make sure you compile with -g:none to omit debug information. That should then be okay with line numbering changes - but if you're changing names that's a more serious change, and one which could indeed be a breaking change.

I'm assuming you can reformat and rebuild without anyone caring - only checking the reformatted code back into source control should give any case for concern. I don't think Java class files have anything in them which gives a build date etc. However, if your "formatting" changes the order of fields etc, that can have a significant effect.

Jon Skeet
+1 for comparing binaries.
mrjoltcola
On the contrary, the JVM relies heavily on symbolic references. Hence, any class, interface, method or field used in your code will have its name (String) inserted in the Class file. In addition, if you're compiling with the debugging mode, any variable name or even a change in the source line numbers will result in a different class file.
HH
@HH: Doh, of course you're right about the line information. Formatting shouldn't change names though.
Jon Skeet
@Jon: You're right, I thought I read refactoring :)
HH
Actually, a more sophisticated comparison would involve only the code part in the Class file (with all indices replaced by the referenced name). That sounds like a nice project :)
HH
@Jon, Although you didn't cover the business argument, I found you technical answer to be the most helpful and thus am selecting your answer. MD5 is easier to explain (same hash = same file) them how the java compiler works (and provides a low level reproducible test to boot).
jtsampson
@HH: A "more sophisticated comparison" is more likely to have bugs.
Brian
+1  A: 

It is safe in the sense that pure formatting changes will make no difference to what's compiled, and thus no difference to the behaviour of the code at runtime.

It is worth remembering that bulk reformatting of code can lead to "fun" when dealing with source control later - if multiple colleagues have the code checked out, and one team member comes along and reformats it, then all those copies are out of date. Worse, when they update their working copies, all manner of conflicts are going to appear, because those formatting changes will affect huge portions of the code, and resolving that can be a nightmare.

Rob
A: 

I know the answers above are all fine, but here is another possible one: Do a CRC on the compiled version before and after a reformat. Since compiling would ignore the spaces, tabs, linefeeds, etc, then the compiled version should be identical to the original, and that would prove to those semi-technical mgrs that all is well.

MJB
+12  A: 

How about the pragmatic approach?

  1. Build the app.
  2. Save the app.
  3. Reformat the code.
  4. Build the app.
  5. Diff the binaries.
lavinio
Some code doesn't result in binaries until runtime (JSP, ASP, and every other active page technology). If your app has none of that, then I think this is a good idea.
mrjoltcola
The OP said **.java** files; they are all compiled.
lavinio
Lavino, Thanks for your response, this seems similar to Jon Skeet's above, and several below. I think this is a good approach, at least technically
jtsampson
+5  A: 

Well, it's not at all safe and you are unlikely ever to convince them. Speaking as someone who has managed a lot of development I would never consider it in any commercial codebase on which any revenue depended. I'm not saying there aren't advantages to code formatted how you like, but the chances that your formatting will not involve some code changes is nil. That means there's a huge risk for very little gain. If you have to do it, do it piecemeal as you bug fix the code, don't do it in a big hit. It may be a good decision for you as programmers but it would be a terrible decision for them as management.

Simon
I disagree. With automated unit tests and source control to roll things back, this should be a no-brainer.
Paddyslacker
presuming you have 100% code coverage and the reformatting does not change any code functionally - which is NEVER the case. I have never met a developer doing this sort of thing who can resist the temptation to include things like single exit point from functions in their reformatting activity - or whatever their pet peeve is about the current format. Far from being a no-brainer, it is suicide on any commercial code base - where is the beneft to the customer? It simply isn't worth the risk. Do it piecemeal as you fix bugs.
Simon
I diagree also, however on different grounds. Intuitively, if simply altering the format of a java source file could adversely affect the functionality of an application (and let's stick to white space changes with no reordering of static member variables) then java as a language would have failed long ago.
jtsampson
@jtsampson: I'm not as sure about Java, but I can find a case where removing a space will change a C++ program. It's something of a pathological case, but nobody talks about mass reformatting of well-formatted code. I don't think C++ failed long ago.
David Thornley
@jtsampson you are presuming that the reformatting is simply moving white space around. My whole point is that the people who are going to be doing the reformatting can never resist the temptation to alter code beyond the whitespace. Think about using braces in if/elseif/else blocks, which I bet will be affected by this change. If the person doing the reformatting doesn't understand all the subtlety of the current implementation - which they almost certainly don't since they didn't write the code - then the scope for making a logical error while reformatting is huge. It is just not worth it.
Simon
@Simon, I wrote the formatter settings so I I am relatively certain that there are only white speace changes. I guess I wasn't clear that the formating will occur via a tool, so there will not be that "temptation" you describe. However I will enter this as an argument in favr of reformatting the code in mass. In response to your third statement; I have (at the recommendations of several of theses posts here performed a md5 of all the .class files both before and after the formatting chages (which includes inserting braces if statments with one line"thens") and all the files match.
jtsampson
@DT I am not familiar with C++. This post is in regards to JAVA only.
jtsampson
+2  A: 

Do your unit tests pass after reformatting? If so, then you've sold the idea to management!

If you're mucking around with untested code, then you'll have a much harder case to make.

Gabriel Isenberg
+1  A: 

Reformatting code is the same as reformatting a document in Word; it changes the layout and thus the readability, but not the contents.

If all files are formatted the same the code becomes more readable, which makes maintenance a bit easier and thus cheaper. Also code reviews can be faster and more effective.

Further, given a good formatting style, bugs can be found more easily as they cannot hide; think of if statements without curly braces and 2 statements within those imaginary braces.

Do be smart and check the code in and tag it before reformatting, so you have a state to go back to (and tell people how easy that would be), reformat and check in and tag again, without any other changes.

extraneon
+1  A: 

Answer these questions for management, and you will have gone a long way of convincing them it's a safe change?

  1. Why does good formatting matters?
  2. What changes will be made? (if you can't answer this, you don't know enough about the re-formatting to know it will be safe)
  3. Will our unit test suites prove the changes had no ill effects? (hint the answer needs to be yes)
  4. Will the existing code be tagged in the source repository so we have a quick roll back option? (hint the answer better be yes)

That about covers it.

Jay
+1  A: 

Actually, I'd probably be on their side. Reformat units as you open them for fixes or enhancement when they will be thoroughly tested before going back into production. They should have been formatted correctly the first time but if they're in production it seems needless and reckless to reformat them only for style's sake.

Consistency is good, but "a foolish consistency is the hobgoblin of small minds".

Larry Lustig
A: 

Technically, during the first phase of compilation, lexer strips all comments and the whitespace from the source. This is long before any semantics of code is being recognized by the compiler. Therefore any whitespace or comments cannot change anything in the program logic. On the contrary, of what use would the language be and who would like to use it if adding a couple of spaces or newlines would change it semantics?

On the business side, you are probably going to use some specialized tools for that. I am sure they advertise on their websites that they work great.

Final note: if you have to convince your management of that, maybe you should look to find a way to work with smarter people?

pajton
"of what use would the language be and who would like to use it if adding a couple of spaces or newliness would change it semantics" Python relies on white-space. Ruby too I think...
Chad
@Chad: Fortran at least used to rely on such things (I haven't kept up after Fortran 77), so this could break a Fortran program. In cases where the presence or absence of a newline can matter (including C++ `//` comments, for example), there's a lot of difference between ending the line with a backslash as opposed to backslash-space. There's probably other odd cases I could come up with.
David Thornley
pajon, thanks for your comments regarding the compiler, this is definitely something I would add to my explanation why (white space) reformatting is not an issue.
jtsampson
@Chad Yes, yes I know. We are talking about Java and I was trying to be concise. @David actually when I was writing the "on contrary" part I had Fortran in mind:-). Whitespace dependency in Fortran is(was) such a pain.
pajton
@pajton: Certainly whitespace rules in Fortran are a pain (actually, I don't know how many still exist). However, that doesn't mean Fortran's not useful or not used.
David Thornley
+1  A: 

I'm donning my manager hat...

To do it as one grand project, I wouldn't let you do it no matter the argument. I would, however, be open to longer estimates on changes because you are modifying existing files to include these formatting changes. I would require you make the formatting changes its own check-in though.

Austin Salonen
A: 

If your code has near enough 100% code coverage then I think the risk can be lowered a little bit.

However even if the management agreed that the code base is safe, I think they'd have their eyes on having to justify paying an employee to spend hours reformatting code just to adhere to a standard that (I presume) was introduced long into the development lifecycle.

djhworld
+1  A: 

You want the "code in compliance with the company's coding standards" [sic] and want to convince management?

Trivial: install CheckStyle, make it part of your process, feed it your coding guidelines, and show them that the whole codebase miserably FAILS on CheckStyle.

Webinator
WizardOfOdds, thanks for your comments. I already have this hooked up and keyed to the company's coding standards including a suppression file for cases deemed not important in test code. This was a part of a personal project leading up to posting this question. Before a reformat we have 800,000 violations, after format 100,000 violations - this is definitely going in on the technical side of the argument.
jtsampson
A: 

We use Jalopy here at my current job. It is a pretty solid product and it produces really neat output. The most senior developer here reformatted all the code base when he migrated it from CVS to SVN, and he had to perform some tests to make sure it would work all the way from start to end, and now we have hooks to ensure that checked-in code is properly formatted.

That being said, I don't think you can convince anyone that any tool is fool (or fault) proof, because there is no such tool. If you think the benefit is worth the time and the (very small) risk, try to convince your management the biggest advantage you see in doing this. For me, the largest advantage will come if:

  • All the developers have the same formatting settings;
  • The formatting of the source code is checked at check-in by a hook in your SCM.

Because if you do the above, if your code is already formatted, when you compare revisions in your SCM you will see actual changes in the logic of the program, and not just formatting changes.

Ravi Wallau
+4  A: 

What management are we talking about here? Are they tech-savvy enough to understand the what code formatting is and how Java treats whitespace? Because if they are not, I don't think they are qualified to make such a technical decision (i.e., such questions should be delegated to someone who is responsible for the code).

But if they are or you are trying to convince your "architect" or someone similar, well, then it's about trusting a third party tool. Suggest a formatter that has a good reputation, other than that it's not much you can do, since you didn't code the formatter.

As a side track, let me share an anecdote. Our architect decided at a time to reformat all files. Out of thousands of Java files, not a single error has yet been found (and this was over half a year ago). This makes me trust Eclipse's formatter for Java source code. The benefits of this formatting were:

  • Some badly formatted classes are now easier to read.
  • Same formatting everywhere.

But it also had some negative sides:

  • A code formatter is not perfect. Sometimes manually formatted code reads better. The formatter in particular struggles with really bad code (too long lines, too many nested ifs, etc).
  • Do you have other branches of code, like an old version that occasionally needs to be patched? Because you can forget about merging between branches with different code styles (at least when using SVN).
  • You are touching all files (and sometimes almost every line) and ruining the history of all files at once. It hurts traceability.
  • There is actually a small benefit in that each developer has his own code formatting, because you start learning that formatting, and you can immediately identify the author of a piece of code

I personally think the negative outweighs the positive. It sounds like a great idea, but in reality you don't gain as much as you think. When you come across some terribly formatted code, reformat just that class or just that method and see it as a small step toward the big goal.

waxwing
By Management, I mean everybody who is not me - that includes the tech savvy and non-tech savy. Thanks for the anecdote, that is stedily confirming what I think to be the case. Your point about source control is duly noted and understood. It was deliberately left out of this question to narrow my possible responses. We have an event nearing that requires moving our SVN tree to a new location and no merging from old branches will be permitted going forward (which would minimize this concern) and is why I posed the question.
jtsampson
A: 

If you have good coverage of unit test, test results before and after will be enough.

javydreamercsw
A: 

Just one specific heads up: if your company policy includes alphabetic member sorting, be aware that the order of static fields does matter. So if you include an on-save or cleanup rule which does this, you might break your code.

Adriaan Koster
Adriaan, We were aware of this (but your caution is still very valid). I have updated the original question WRT to this as well as reordering of static code.
jtsampson
+2  A: 

This is a good example of the technical-business mismatch.

The technical people want to do it because it can make the code hard to read but, unless it's exceptionally bad, the real reason is that it offends the typically delicate sensibilities and aesthetics of the average programmer.

The business people want to manage risk. Risk can be undertaken if there is some benefit and there is no business benefit here unless you argue it'll be cheaper, faster and/or less risky to do future development with reformatted source code, which in all honesty is a tough sell.

Almost by definition any change has risk attached. The risk here is remote but isn't nonexistent either (from management's perspective) with almost no upside.

There is another issue to consider too: this kind of change can play havoc with source control. It becomes harder to track who changed what because the most recent change to any line will be the reformatting so you'll need to go comparing revisions, which is somewhat more tedious than a simple "blame" or "annotate" command.

Also, if you have several active branches a reformat of your code will cause havoc with your merges.

cletus
+ 1. I didn't think about the problems you are talking about when I first saw this question.
Ravi Wallau
Thnaks for your comments, You make some good points for the non-technical side which was largely ignored by most of the posted answers.
jtsampson
@jtsampson - Out of interest, does your company sell software, or do you produce software for internal use by a company that sells something else? The answer to this will affect the balance cletus mentions in his second paragraph....
Nicholas White
+1  A: 

Thanks for all your responses.

My final argument to convince management; Bits of all your responses included. Thanks for the assistance.

Technical:

  • Reformat consists of white space changes (no import reordering, no member/method)
  • Reformat will use [specify tool and process]
  • Reformat will occur on [specify time within coding cycle to minimize merge impact]

Both before and after a reformat:

  • All Unit tests will pass
  • All Integration tests will pass
  • All Functional tests will pass
  • All SOAP-UI tests will pass
  • The byte code is the same (An MD5 of the .class files following javac (-g:none))

Business:

Purpose: to comply with company standards which prescribes that our source files accurately represent the logical structure of our code.

  • Reformat change vs. Code change (Word document example as above)
  • Reformat will use [general process]
  • Reformat will occur on [specify time within business cycle to minimize impact]

Pilot Test:

  • Confirmed "Format Batch" resulted in less merge conflicts then "Format as you Code". .
  • Confirmed that the executable code (4k+ .class files) remains the same. (MD5 test)
  • Confirmed functionality will not be affected (automated tests/smoke tests)
  • Confirmed formatter settings contain only white space changes.

Note: In my case a pilot test was run over 6 months by a subset of the developers using an automated tool to "Format as you code" (as prescribed by some of the answers above). While some perceived that the reformatting caused more merge conflicts, this was actually not the case.

This perception was base on the temporal coincidence of the reformat. For instance, consider the person who know nothing about cars. One day their brakes fail. To what do they attribute the cause? The gas of course. It was the last thing they put into the car (the "gas station" effect?). Clearly however, brakes and a fuel system are disparate system as are formatting and code changes. We found that improper check-ins within the context of our build process were at fault.

Last I was hoping that someone would have provided a good link to a study showing productivity gains related to common code as it is difficult to show ROI to the business. Although in my case, since it was a company standard I had "compliance" on my side. I only had to show that it was more time consuming to "Format as you Code" vs. "Batch Format"

jtsampson
A: 

I would ask Management what is their current basis for believing the code works - then demonstrate that the same tool (tests, documentation, little voices...) works exactly as well for the reformatted code. I would want their answer to be "tests"...

Carl Manaster