views:

1331

answers:

22

I am trying to design a script for the automation of a very tedious configuration process we had been doing by hand. The configuration process mostly consists of copying files from various ClearCase views, editing their contents in a predictable way, and putting them into a new local tree of directories.

There's also some Protege and MS SQL Server business involved, but that is not much trouble to do by hand. I am more concerned with the endless copying and pasting that is prone to human error. The configuration process is mostly done in Windows XP, but there's also a fair number of scripts run on Red Hat Linux.

Right now I only know C++ and Java, but I was thinking I'd learn some Python for all this file manipulation; it seems like pretty high-level work so a scripting language would be appropriate.

Am I on the right track? Are there any particularly common mistakes I should watch out for?

[edit: I don't have any particular technical questions at this early stage (though I'll no doubt post again when I get down to the nitty-gritty), well, except: would Python be a good choice of language for this task, considering that I've never used it before (though I am quite familiar with C++ and Java)? Or should I just work in one of those?]

+10  A: 

Noob mistake number one: Under-estimating the time it will take to complete.

Consider doubling or tripling the amount of time that you think it will take. If you really are a newcomer, your boss understands this. If it takes less time than your estimate, you're a hero.

Robert Harvey
Thank you. I hadn't been thinking of it that way, but it makes perfect sense.
tardcaster
With twenty-seven years of programming experience, I still find that I have to triple my initial estimate to get near the right answer.
Head Geek
The usual tongue-in-cheek algorithm for correcting one is initial estimate is to double it, then go to the next higher unit of time. Thus, a 2 hour initial estimate will map to 4 days.
mpez0
+1  A: 

Don't overestimate you abilities, especially on the schedule.

If you say you'll finish in 2 weeks and it turns into 2 months, people will be upset. Make sure to set realistic deadlines and goals for yourself.

samoz
+27  A: 
  1. Imagine yourself creating the script, how long it would take.
  2. Sextuple it. (Edit: someone finally caught my "quintuple" bug)

Why?

  • The first 100% increase is because there's no such thing as 100% productivity.
  • The second 100%, because you're learning a new language.
  • The third 100%, because you aren't considering the edge cases that automation will unveil.
  • The fourth 100%, because you need to test and document your code.
  • The fifth 100% for support, maintenance, and bug fixes during the first year of operation.

So, if you think it's going to take you a day and a half of straight coding, one week would be a more reasonable estimate.

When making your case, be sure to do the cost-benefit analysis. If it saves someone at your pay grade 20 hours a week, the pay-off takes two weeks.

richardtallent
This is exactly what I was looking for, thank you.
tardcaster
+1, excellent advice.
Janie
I would only multiply my estimation by 2-4 in this case. Fifth is post-release, so it doesn't count.
Thomas Owens
Then apply Hofstadter's law.
Null303
@Thomas: He already took it into account: He added 500%, but only multiplied by 5 ;)
Tordek
@Thomas: When explaining cost to management, they aren't interested in the software life cycle labels, they're more interested in the TCO across some time period vs. the cost of the status quo. So if the audience is management (and it appears it is), I think maintenance and support shouldn't be swept under the rug. And including it is a good CYA to prevent manager drama about "hidden costs" *when* the system needs a little adjustment and tweaking post-deployment.
richardtallent
@Tordek: glad someone finally noticed that. You win the easter egg. ;)
richardtallent
A: 

You are on the right track! You will do just fine!

Good luck!

CL23
+2  A: 

Noob mistake number 2: Assuming that you'll need to "knock this assignment out of the park" to get more work.

Janie
Well, I figured if I don't do such a good job on this, the software part of my duties will diminish and systems part will take over. I'd eventually like to be 100% on the software side, so I figured this was my chance to make a good impression as a programmer (as opposed to an integrator/tester).
tardcaster
I was being a bit tounge-in-cheek of course, but I remember feeling the same way when I started off as a young lad like yourself. Truth of the matter is however, in the wild world of corporate development, incompetence is startlingly common. see: http://thedailywtf.com/
Janie
+6  A: 

I think everybody should know the rule of UPOD -- Under promise and over deliver. So in general you should make time estimates for projects/task; track you time to complete tasks, and then you will know how much on average your estimate is off.

Estimating time to create software can be a difficult task. I always give a rule of thumb of take your best honest estimate and times by 3 to estimate your time. If you track general time on the project you will find general numbers for a particular environment(job site/ personnel)

And lastly don't forget to ask other people at your job to step up to the plate and estimate what they think it will take to complete the same project.

good luck

Chad
aka, the "Scotty Factor" ;)
richardtallent
+4  A: 
  • The first 90% of the work will take 90% of your time. The last 10% of the work will take 90% of your time.
  • Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.

etc. and so forth.

Mike Robinson
I need 180% of my time... oh man I'm doomed
SwDevMan81
+1  A: 

You should write down you general ideas for how to complete the task, then talk to your manager and ask for him/her to give you some feedback on your thoughts so far. Let him/her know that you really want to do a good job and would appreciate his/her guidance, which is part of a manager's job. Hopefully your workplace is well-run enough that there is understanding and support for your situation as a fresh programmer.

You shouldn't have to seek the advice of strangers on the web for something like this, however nice and supportive they are. Your employer should give you the support you need to succeed in completing your task. Don't be afraid to ask.

"Your employer should give you the support you need to succeed in completing your task. Don't be afraid to ask."Very true. I'm doing that too.
tardcaster
+2  A: 

I think you are asking help in the wrong place.
Go to your team leader / boss, explain him the situation and your concerns - and ask him for an help in the estimate and in the planning.

Estimation and delivery of a project is a fine art - and no one can expect you to already master it.
However, you are accountable for knowing your limits, and asking colleagues for help.

[Just a note: this -as many others in this topic- is a reply to the original question, that was quite different from the current one]

Roberto Liffredo
+6  A: 

Rexx or Perl come to mind for file manipulation (both very easy to learn). I dont know Python either, so I can't compare that to these two.

SwDevMan81
I second Perl. It's a great language for text manipulation.
Goose Bumper
Having used both Rexx and Python, I prefer Python. Agree that Rexx is easy to learn, though.
Anon
+1  A: 

As several posters have said, underestimating is a risk for you. By the same token, since management knows your level of (in)experience, they should figure that your estimates will have a significant margin of error. Put your best work into your estimates, extend them some to take into account the variables like those pointed out by richardtallent, and when you present your estimates, point out the specific sections that you are least confident about how long a given sub-task might take.

One thing that can be helpful in estimation is to break the project up into parts that are small enough and similar enough to what you have done before so that you can have a decent level of confidence of your estimate of that part.

Since you don't have much experience, maybe you want to estimate the project using a language you already know and also estimate it with Python. While Python sounds like it might be more appropriate for the task, if you find a big difference in your estimates, you can give management the choice of which implementation they want.

Another idea: is there anyone in your organization with whom you can "sanity check" your estimates?

Now I'll play the other side for a moment - overestimating by too much can lead a manager to think:

  • You're lazy or sandbagging
  • You don't know how to estimate
  • You want to cover your backside more than you want to give him/her realistic figures

And there can be danger in harboring a "hero" reputation as someone who comes in under estimates - someday your estimates will be more accurate and how realistic your estimates are will be more clear to your co-workers and management. At that time, you won't be able to get away with as much "padding", so to come in under estimate would require cutting quality or working hours off the clock.

JeffH
+1  A: 

I would use perl for this task. YMMV, but perl still has some of the best tools for exactly this sort of operation: taking a set of files, doing some text-level operation to them, and putting them back (or editing them in place).

Alex Feinman
A: 

Definitely PHP. Definitely.

markwatson
+3  A: 

I'm a Ruby fan myself. I've also dabbled in Perl and Python. Any of these languages will work out for you, it's personal preference.

Here are some Ruby resources:

edit: _why has disappeared! Here is a mirror of the poignant guide, or you can download the PDF.

Peter Di Cecco
why's poignant guide to Ruby is impressive. Thanks
Sathya
A: 

You may consider AWK as the language to use. If you're used to C languages, but want scripting and heavy duty text manipulation, AWK might be a contender.

MKaras
+2  A: 

Perl is IMO the daddy for text manipulation. I put my hands up to being a front end dev who is more comfortable with CSS and Javascript, but I have done alot of the kind of work that you mention on a small scale, and I found the basics of PERL most useful to get me off the ground. You will need to have a good grounding in regex, so i suggest you learn the basics of perl file handling, and then get a good overview of regex for perl ( keep a cheat sheet bookmark or three handy) and best of luck. Most of all I can say I really enjoyed learning how to use PERL, and whenever I get a task that requires automating text manipulation I relish doing it all over again. Good luck pal, and enjoy!

Crayonz
+8  A: 

I'm surprised so many posters concern themselves with time estimates and manager communication. I thought the question was about programming languages?! Have I missed something, or is it due to post editing (a tricky beast here at SO)?!

Anyway, here are my 2 cent:

  • Definitely go for a scripting language. It will be a very valuable addition to your personal skill set, and will pay off hundreds of times in the future. Since you are interested in programming anyway, think strategic, this is not for this job only. And you will find that you are more productive with a scripting language.
  • Don't go back to the dinosaurs like AWK. This is just for guys who know them already, along with several alternatives, but will take you nowhere in the long run. No proper investment.
  • Perl has been mentioned a couple of times, and has a justified reputation for these kind of jobs. But after 7 years of Perl, let me recommend you to Python. It is so much easier to learn, develop and maintain code in Python. For your own sanity, use Python.
  • The only valid alternative I see for you is choosing a JVM-based language, like Groovy or Clojure. This would allow you to re-use all the platform knowledge you already have, use of libraries and such.
ThomasH
+1 for answering the question
Ralph
Actually, the original question was way different.
Roberto Liffredo
@Roberto Thanks, I thought as much. That's a problem here at SO. Questions get edited, and early answers suddenly look way out of scope. Don't know how to solve this one...
ThomasH
+1 for "think strategic, this is not for this job only"
Dan
A: 

Depending on what you need to have done, some languages will be simpler/easier/faster to do it with than others.

The more mature the language the more information exists either in the form of books/examples or solutions.

You best bet in deciding "which one to learn" is to try a few and see which one syntactically "agrees" with you :)

Then, get used to at least two, because you'll end up dealing with or using at least four!

D

dimitri.p
+2  A: 

I recommend you to install a linux distribution on your computer, and then learn bash and the basic Unix tools.

Some examples of these tools are:

  1. grep, an utility to retrieve the lines of a file matching a particular regular expression (a set of characters)
  2. awk, to work with csv files as if you were using a spreadsheet
  3. sed, to make word and characters substitutions
  4. wc, sort, join, paste, uniq: tools which make more or less what their names indicate (e.g. sort: sort a file given a key)

To complement these tools and have a way to organize them in a pipeline, I recommend you to study how to write a Makefile.

In the future, you can complement your skills on bash with perl or python, but it is always worth to know these tools, as they are always there and they are free.

dalloliogm
+1  A: 

Linux "bash" scripting is the easiest way to get productive quickly. The bash shell is all about file and text manipulation. On Windows, bash requires installation of either cygwin, mingw-msys, or "windows services for unix" (I prefer the lightweight msys environment)

If you want to learn a fully featured scripting language, Python is pretty much taking over, Ruby is not far behind. Forget Perl, it's becoming less relevant and it's quite scary to learn.

Otherwise look into Windows Powershell, or WScript

ropata
A: 

The question seems to concern itself with file manipulation and working with multiple clearcase views. I recommend a shell language, since clearcase has a comprehensive and useable command-line interface. Either plan 9's "rc", or the Korn shell "ksh", are consistent programming environments. The other major advantage to the shell is that you can tie together the pieces which do subtasks, and write those pieces in the best languages for the subtasks.

The actual configuration task is described as: take an input file, perform some transformation on it, and put it somewhere else. The posix standard (and cygwin) tool "make" handles situations like this efficiently, since a makefile rule of this form will only execute when the input file changes (this reduces process time through build avoidance).

/path/to/output/file: /path/to/input/file
    transformation command

See http://www.opengroup.org/onlinepubs/009695399/utilities/make.html, or any of the various books on make, for a description, tutorial, and reference. Make tends to scale to arbitrarily complex file manipulation tasks, since it takes on the job of figuring out the minimum set of work that needs done.

As a general approach, I'd look for a match between the thousands of tools and languages at your disposal, and the problem you face. The less code you write to solve this problem (even if it takes several tools and languages), the better off you, the code base, its maintainers, and your organization will be. An advantage of posix (standardized unix) tools in particular is that they are portable, generally straightforward to pick up and get working, well-documented around the net, and work in the problem domain you face (files, and calling special-purpose sub-programs).

Jason Catena
A: 

Python is a good choice. It is a good general purpose programming language.

orokusaki