tags:

views:

250

answers:

9

I know that refactoring is "changing the structure of a program so that the functionality is not changed". I was talking with some of the guys I'm working with on my final year project at university and I was surprised that they have a much more expansive (for want of a better word) view of refactoring. I consider refactoring to be things like extracting methods and renaming classes. They also suggested things like changing data structures (like a Java LinkedList to an ArrayList), changing algorithms (using merge sort instead of bubble sort), and even rewriting large chunks of code as refactoring. I was quite sure that they were wrong, but I wasn't able to give a good reason why because what they were suggesting did change the program (and presumably make it better) without changing its behaviour. Am I right, and more importantly, why?

+6  A: 

I think you are right, but arguing over the meaning of a word is not particularly interesting or productive.

cbp
I'd normally agree with this, but the discussion came up when we were reviewing documents we had written, and I think it's good when we are talking about the same thing when we use the same words.
David Johnstone
+9  A: 

Martin Fowler's "Refactoring: Improving the Design of Existing Code" is perhaps THE reference:

Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which "too small to be worth doing". However the cumulative effect of each of these transformations is quite significant. By doing them in small steps you reduce the risk of introducing errors. You also avoid having the system broken while you are carrying out the restructuring - which allows you to gradually refactor a system over an extended period of time.

Refactoring goes hand-in-hand with unit testing. Write tests before you refactor and then you have a confidence level in the refactoring (proportional to the coverage of the tests).

A good reference is: Information about Refactoring

Mitch Wheat
The Fowler quote is of course relevant, and yes, it goes hand-in-hand with unit tests... But, does this really answer the questions asked: Are the examples mentioned refactoring or just modifying code? Who's right, the OP or his colleagues, and why?
Jonik
Would the downvoter please leave a comment. Thanks.
Mitch Wheat
+3  A: 

If the interface to a piece of code changes, then I consider that more than refactoring.

The typical case for refactoring is

  • "Oh, all my unit tests run, but I think my code could be made cleaner"
  • Change code to be more readable / clearer / efficient
  • Re-run unit tests (without changing the tests) and check they still work

This means that the term refactoring is relative to the interface you are discussing. i.e you could be refactoring the code behind one interface, while more extensively changing the code of another at a lower level (maybe that distinction is what's providing the confusion between you and your colleagues here?)

DanSingerman
+1  A: 

I disagree:

In software engineering, "refactoring" source code means improving it without changing its overall results [...]

You already know the more precise terms used for subsets of refactoring, and yes, it's a very general term.

l0b0
A: 

I think that nobody can benefit of a too strong definition of the term 'refactoring'. The border between how you do perceive it and your colleagues is blurry and can be closer to their or your view depending on many facts. Since it's dynamic let's try to define it. First of all define the boundaries of the system or subsystem that you are trying to refactor.

If it is a method keep the name, input arguments, type of the returned value and possibly throwing statements fixed. Apply all the changes inside the method without changing how it is viewed outside.

If you refactor a class fix its public API and using rename variables, extract methods and all the other available techniques change the class to be more readable and/or more performant.

If the part of the code that you are refactoring is a package or a module do the refactoring inside of it possibly renaming classes, deleting, introducing interfaces, pushing/pulling code into super/subclasses.

Boris Pavlović
+1  A: 

http://en.wikipedia.org/wiki/Code_refactoring

Code refactoring is the process of changing a computer program's internal structure without modifying its external functional behavior or existing functionality, in order to improve internal non-functional properties of the software, for example to improve code readability, to simplify code structure, to change code to adhere to a given programming paradigm, to improve maintainability, to improve performance, or to improve extensibility.

I agree that refactoring code does include breaking existing code. Just make sure that you have unit tests so that you do not introduce any bugs, and the rest of the code compiles. Using refactoring tools like Resharper for C# makes this so easy!

  • Making code more understandable
  • Cleaning up code and making it tidier
  • Removing code! Redundant, unused code and comments should be deleted
  • Improving performance
  • Making something more generic. Start with the simplest thing possible, then refactor it out to make it easier to test / isolate or generic so it can work in different ways through polymorphism
  • Keeping code DRY - Don't repeat yourself, so a refactoring session might involve taking some repeated code and refactoring it into a single component / class / module.
Jake Scott
+2  A: 

With Martin Fowler's definition in mind,

Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.

... I think you are clearly right.

They also suggested things like changing data structures (like a Java LinkedList to an ArrayList), changing algorithms (using merge sort instead of bubble sort), and even rewriting large chunks of code as refactoring.

Changing an algorithm to something much faster is obviously not refactoring, because external behaviour is changed! (Then again, if the effect is never noticeable, perhaps you could call it refactoring after all - and also premature optimisation. :-)

This is a pet peeve of mine; it's annoying when people use the term sloppily - I've even come across some who might casually use refactoring for basically any kind of change or fix. Yeah, it's a hip and cool buzzword and all, but there's nothing wrong with plain old terms like change, rewrite or performance improvement. We should use those when appropriate, and reserve refactoring for cases when you are truly just improving the internal structure of your software. Within a development team, especially, having a common language for accurately discussing your work does matter.

Jonik
I'm not sure that making code faster qualifies as a change in external behavior...
GalacticCowboy
Yeah, I see your point, I guess it depends on which angle you look at it from. :) In any case, IMO, when programming you should pay attention to which "hat" you're wearing at each moment, i.e. what exactly you are trying to achieve. In other words, you should consciously separate when you are adding/fixing a feature, when refactoring, and when optimising (improving performance). IIRC, Fowler also talks about this in his definitive book on refactoring.
Jonik
True. I do agree that the term is used rather lazily most of the time...
GalacticCowboy
+1  A: 

Fowler draws a clean line between changes to code that do, and those that do not, affect its behavior. He calls those that do not, "refactoring". This is an important distinction, because if we divide our work into refactoring and non-refactoring code modification activities (Fowler calls it "wearing different hats"), we can apply different, goal-appropriate techniques.

If we are making a refactoring, or behavior-preserving code modification:

  • all our unit tests should pass before and after the modification
  • we should not need to modify any tests, or write any new ones
  • we expect cleaner code when we are done
  • we do not expect new behavior

If we are making a behavior-changing code modification:

  • we expect new behavior
  • we should write new tests
  • we may get dirtier code when we are done (and should then refactor it)

If we lose sight of this distinction, then our expectations for any given code modification task are muddled and complex, or at any rate more muddled and more complex than if we are mindful of it. That is why the word and its meaning are important.

Carl Manaster
+1, exactly. Especially the rationale you give; the muddled expectations. When writing my own answer, I had this in mind, even if I failed to write it down as neatly :)
Jonik
+2  A: 

To give my view:

Small, incremental changes that leave the code in a better state than it was found

Definitely Yes: "Cosmetic" changes that are not directly related to features (i.e. it's not billable as a change request).

Definitely No: Rewriting large chunks clearly violates the "small, incremental" part. Refactoring is often used as the opposite of a rewrite: instead of doing it again, improve the existing.

Definitely Maybe: Replacing data structures and algorithms is somewhat of a border case. The deciding difference here IMO is the small steps: be ready to to deliver, be ready to work on another case.


Example: Imagine you have a Report Randomizer Module that is slowed down by it's use of a vector. You've profiled that vector insertions are the bottleneck, but unfortunately the module relies on contigous memory in many places so that when using a list, things would break silently.

Rewriting would mean throwing the Module away an building a better and faster one from scratch, just picking some pieces from the old one. Or writing a new core, then fitting it into the existing dialog.

Refactoring would mean to take small steps to remove the pointer arithmetics, so that the switch. Maybe you even create a utility function wrapping the pointer arithmetics, replacing direct pointer manipulation with calls to that function, then switch to an iterator so that the compiler complains about places where pointer arithmetics is still used, then switch to a list, and then remove the ultility function.


The idea behind is that code gets worse on its own. When fixing bugs and adding features, quality decays in small steps - the meaning of a variable subtly changes, a functions gets an additional parameter that breaks isolation, a loop gets a bit to complex etc. None of these is a real bug, you can't tell a line count that makes the loop to complex, but you hurt readability and maintenance.

Similarly, changing a variable name or extracting a function, aren't tangible improvements of their own. But alltogether, they fight the slow erosion.

Like a wall of pebbles where everyday one falls to the ground. And everyday, one passerby picks it up and puts it back.

peterchen