views:

142

answers:

7

I am a hobbyist programmer and am trying to improve my skills. To that end, I have been reading 'The Pragmatic Programmer' and 'Code Complete' as recommended by some in this forum and I have to say that it has improved my approach and skills. That said, one aspect of programming I am struggling with is the 'Don't Repeat Yourself' (DRY) mantra espoused in 'The Pragmatic Programmer' and echoed in Code Complete. Let me explain.

I recently wrote a small (~2500 lines of code) software program for a friend. Armed with my new knowledge from the books I have been reading I resisted the temptation to write code straight away but instead took some time (~3 days) to design my software in an abstract way e.g. what does the software need to achieve, what does the end user require etc.

Having done my design I then wrote down the the core functions / procedures that I would need to deliver the software e.g. IsFileOpen(), getEmployeeName(), WriteAddressToDatabase() etc. I felt pleased with this and happily set on my way coding.

Half way through my coding I realised that there were functions / procedures that I required that weren't in my initial draft. On the fly I would add these functions / procedures. However, after a day or so I'd realise that the new functions / procedures that I had added had a lot of code repetition. So I'd try and make a generic function / procedure with additional input parameters to handle the nuances of different situations.

What struck me is that this process is almost an infinite loop i.e. you can just go on and on condensing code to remove repetitions. Also, it seems to me very difficult to plan so thoroughly in advance that you know all the functions / procedures that you will need and remove all the repetitions before hand. My questions are:

  1. Am I really just not doing enough initial planning of my software design and this is why I keep having to add new functions / procedures?
  2. Or is this a common experience that you just get better at over time?
  3. There seems to me a contradiction between 'The Pragmatic Programmer' and 'Code Complete'. The former suggests the 'DRY' principle whereas the latter seems to suggest building functions / procedures that only do one thing i.e. don't build functions that return two or three different things. So, as an example, suppose I have a data table with three variables ('Employee Name', 'Age', 'Salary') and I want to retrieve each one at different times. Then 'Code Complete' says build three functions [ getEmpName(), getAge(), getSalary() ] to keep each function specific whereas 'The Pragmatic Programmer' says there is repetition in that code and one should build a single function getEmpDetails(Name, Age, Salary) which would return the variable you want.

For clarity, I am only half way through 'Code Complete' so maybe there is an answer to resolve the contradiction later in the book. Can anybody give me some guidance please.

Alex

+4  A: 

Just from reading your post I can tell you are well on your way to becoming a good programmer. Keep up the good work!

The process of refining your code is called refactoring and there are a lot of books on it out there. The classic is Martin Fowler's book Refactoring.

I try to do my best at removing redundancies in my code in both the design and implementation phases, but I don't go nuts about it. You are right in that you can endlessly refactor your code which is a common criticism of software engineering methods which encourage it (see Extreme Programming Refactored by Stephens and Rosenberg for a critique). Finishing projects usually trumps 'polishing' though. As long as you reduce the major sources of redundancy you'll be fine.

And I personally would use three functions instead of one uber-function. It's a lot cleaner.

Dana Robinson
A: 

Any good writer will tell you the importance of the "Crappy First Draft". A writer named Anne Lamott wrote an essay explaining this concept. Basically the idea is that you write something just to get the ideas out of your head an somewhere you can see them in a concrete fashion so that you can then chip away the bad and refine the good.

Coding is a lot like writing in this way, and that is why I would suggest you do a bit less designing up front and get your hands dirty on the code itself. A quick-and-dirty prototype of your final product will show you some of the places you may have missed in a design session. As you become more experienced you will begin to be able to foresee these areas and there will be less surprises.

I don't want to detract from the concepts you learned in "Code Complete" and "The Pragmatic Programmer". These are great concepts and must be used when designing systems. However until you become more comfortable with this approach I would suggest you take the 3 days you would normally use to design the system and split it in half. Use half of that time to lay down a good design to the best of your abilities. Then take the other half of the time to bang out a prototype to see if there is anything that your design does not account for.

Andrew Hare
Thanks for the input and taking time to help me. My friend has another small project for me so I'll give your apporach a try and see how this compares to my earlier approach.
Remnant
A: 

Alex,

I think "is this a common experience that you just get better at over time" is probably the easiest answer - but not one that will help you!

DRY is a good principal and one I try to adhere to quite strictly. BUT as with most principals there are times when the extra work involved in adhering to it just isn't worth it. When that is (unfortunately) is the bit that comes with experience.

For your example in point 3 you can actually have 3 functions that all call a base (probably private) function that does most of the work and the 3 functions just return the info required.

I also highly recommend looking into Test-First / Test Driven Design (TDD) as this will help you come up with a good design that does what you want (and only what you want) and as an added bonus your code is all tested and therefore easier to change without breaking anything.

Good luck!

Mark

Mark
A: 

That initial planning of your application is why they call design "a wicked problem". This is spoken about in detail in Code Complete.

The problem is you don't know exactly how the thing will work before until you actually build something. But if you build something too early, it may be still have some serious flaws and so be a waste of effort.

So, you seem to have a good approach and have just come up against the issue that all developers face. Work on the design until you feel you have enough of an idea to build something of basic value, build it, and then revise/refactor the design as required now that you know more about the problem. Repeat until finished.

You didn't say which edition of Code Complete you are reading. The Second edition is the one to go with, the first is still a good book, but it has only limited Object Oriented focus and is starting to show it's age a bit.

Your confusion about DRY may be related to this as it is a central goal in Object Oriented design.

Ash
A: 

For a beginner, you're doing pretty well. I think the experience you describe is natural, especially when you talk so much about functions. I have two pieces of advice:

  1. Put more emphasis on data. Fred Brooks said

    Representation is the essence of programming ... Much more often, strategic breakthrough will come from redoing the representation of the data or the tables. This is where the heart of a program lies. Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious.

  2. At 2500 lines, you should be thinking about breaking your program up into pieces that are the size of a module or a file, not just a function. Try to keep in mind David Parnas's slogin that Every Module Hides a Secret. For your application, it sounds like some examples of secrets to be hidden in a module are

    • Knowing how to talk to a database.
    • Knowing the format of a particular file.
    • Knowing what information is available about an employee.

    When each important secret is in one place, you'll find it easier to apply Don't Repeat Yourself, because you'll have a smaller scope for your activities.

Norman Ramsey
+1  A: 

Don't worry too much about planning. One of the best ways to get something done is to figure out one thing somebody should be able to do with your system, program that, and take notes along the way about objects, functions, and generalities.

Adding new functions and procedures happens all the time. In fact, it's encouraged. The central problem of object-oriented programming is figuring out which objects should have responsibility for implementing those functions and procedures.

And in related news, generalizing those functions and procedures is an alluring trap to fall into. You should only generalize things that should always work in the same ways, otherwise you may make a change to a function that fixes some problems and causes others.

This is, in fact, the big knotty problem at the core of development - cohesion vs. coupling. Cohesion is how easy code is to understand at first glance - 2500 lines in a single file not so much as 5 linked files of 50 lines. Coupling is how much "foreign" code is impacted by any given change - one file not so much as 5 linked files. Maintainable code has both high cohesion and low coupling, but there's no quick and easy benchmark to determine how much of a tradeoff is acceptable.

Glazius
A: 

DRY is a good goal, but it is often difficult. Since you read pragmatic programmer, you must know that sometimes in order to achieve DRY they need to work around limitations within a programming language. For example, code generation is one huge technique they use to get around the limitations some languages.

They advocate things like creating an abstract structure to represent your data types and then using a code generator to generate both the SQL table creation/updating scripts and the data access code in their procedural language (it has been a while since I read the book, so this may be one of their other books).

Anyway some places have coding standards which explicitly forbid this stuff. So it may even be near impossible to achieve. Also look at things like C where typically you include a function definition inside of a .C file and a prototype inside of a .h file. The first line of the definition/prototype is the same just about. The majority of C programmers I have seen just copy/paste the prototype rather than make some type of generator to create the .H files.

In reality DRY is a tradeoff. The cost of maintaining the duplicate code against the cost of eliminating the duplication. The cost is in time (often times cut/paste is much faster than eliminating the repetition, but sometimes in maintenance the costs go way up constantly finding/changing all the copies), accuracy (what happens when that data is not automatically maintained in synch.), clarity (sometimes the methods to eliminate duplication make the code over complicated), etc... Anyway you need to think about the tradeoffs. For example in C it is no too much trouble to change your function definition and then go change the .h file. If you make some custom language and a code generator to keep the two in sync, then a standard C programmer joining your project will have to learn the generator. If you made it in LISP or some less popular language, that might greatly reduce the pool of people you can hire to work on it (important in a corporate setting). A lot of programmers in language X only know language X and do not want to know any other ones at all. And some even though they may learn a few languages may not be interested in whatever language the code generator was created in.

Everything is a tradeoff and each time you strive for DRY you need to start asking yourself the cost. I notice you said you spent days designing the software. There is a good chance that you might fall into the trap of spending months to design a major system and in the end whoever you are designing it for just gives up. As has been mentioned system design is a wicked problem, there are some things you just won't know until you try to implement it. Also the sooner you get prototypes to your users the better because often they will change their mind, or you will find you were wrong. If you spend months up front designing the system and then implementing it you will be much further behind than spending a few weeks to knock out version 1, getting feedback, and knocking out version 2. Although it really depends, there are some systems that benefit from being designed up front when you know the requirements won't change (think embedded stuff, space satellites, etc..) or a mistake can cost a lot of money. Most systems that interact with users are not in this category.

Also as has been mentioned, refactoring can be done. The book by Martin Fowler is worth a read. Basically you can put the code out with duplication. Then later improve it. You may even find future features that add to the duplication and enable you to create a new abstraction to eliminate it that you wouldn't have seen had you designed up front based on the initial requirements. But also with every refactoring is a cost. You need to ask yourself: What is the cost of this duplication? What is the cost of eliminating it? And you will want to look at maintenance costs now, maintenance costs if your plan works, how much it takes to implement your plan, and what other features are there to be done that might be more important?

Anyway thousands of line of cut/paste code make me sick to my stomach looking through it. But one or two lines here and there (especially if they aren't things that frequently change) are fine. I have also seen some terrible code generators which are nothing but special cases because the person was obsessed with eliminating duplication in the main project but didn't pay attention in the code generator.

Cervo