views:

503

answers:

6

What language should I use for file and string manipulation?

This might seem objective, but really isn't I think. There's lot to say about this. For example I can see clearly that for most usages Perl would be a more obvious candidate than Java. I need to do this quite often and at this time I use C# for it, but I would like a more scriptlike language to do this.

I can imagine Perl would be a candidate for it, but I would like to do it in PowerShell since PowerShell can access the .NET library (easy). Or is Python a better candidate for it? If I have to learn a new language, Python is certainly one on my list, rather than Perl.

What I want to do for example, is to read a file, make some changes and save it again. E.g.: open it, number all lines (say with 3 digits) and close it. Any example, in any language, would be welcome, but the shorter the better. It is utility scripting I'm after here, not OO, TDDeveloped, unit-tested stuff of course.

What I would very much like to see is something as (pseudocode here):

open foobar.as f

foreach  line in f.lines 
 line.addBefore(currenIteratorCounter.format('ddd') + '. ')

close f

So:

bar.txt 

Frank Zappa
Cowboy Henk
Tom Waits

numberLines bar.txt

bar.txt 

001. Frank Zappa
002. Cowboy Henk
003. Tom Waits

UPDATE:

The Perl and Python examples here are great, and definitely in the line of what I was hoping and expecting. But aren't there any PowerShell guys out there?

+1  A: 

Python

target = open( "bar_with_numbers.txt", "w" )
source = open( "bar.txt", "r" )
for count, line in enumerate( source ):
    target.write( "%3d. %s\n" % ( count+1, line ) )
source.close()
target.close()

First, it's a bad policy to "update" files in place. In the long run, this becomes a regrettable decision because debugging is made harder by the loss of history.

If you use OS redirection features, this program can be simplified.

import sys
for count, line in enumerate( sys.stdin ):
    sys.stdout.write( "%3d. %s\n" % ( count+1, line ) )

Then you can run this enumerate.py as follows

python enumerate.py <bar.txt >bar_with_numbers.txt

More importantly, you can also do this.

python enumerate.py <bar.txt | the_next_step
S.Lott
+1 wow , fantastic, I just added my pseudolang, and your python nearly does that, tx. As far as your remark goes : ofcourse, its for the sake of the example plus I never lose history tx to subversion.
Peter
Keeping intermediate result in subversion isn't a substitute for a good pipeline design. Updating files in place is simply bad design. Programs crash, corrupting files and making fall-back more complex than a trivial rerun.
S.Lott
+1  A: 

Definitely Perl. It supports inline replacement (on Windows you have to start the script with perl .i.bak (because Windows cannot do this inline and creates a .bak file with the same name.)

open(IN,'+>'.$yourfile) || die "Can not open file $yourfile: $!";

my $line_no = 1;

while(<IN>){
   print "$line_no. $_";
   $line_no++;
}
close IN;

Code just typed from memory without testing. But that should work. You probably want to add some logic for formatting $line_no (e.g. first count lines and then add as much zero digits as you need.)

SchlaWiener
+2  A: 
perl -i -ne 'printf("00%d. %s",$.,$_)' your-filename-here

You may want %03d instead.

+1 If I see this, I might have to reconcider perl... does this work on windows to? see remark of SchlaWiener
Peter
It has been many many years since I've used perl on windows, so I'll defer to SchlaWiener's suggestion of -i.bak. Certainly it can't hurt and having a backup copy is always good.
+1, but you don't even need '$_ =' part.
Igor Krivokon
Oops, that was from a previous version. Fixed, thanks.
@Peter: it works on windows if you append .bak to -i (e.g. perl.exe -i.bak). And it should work on wildcards, too. Just replace your-filename-here with *.log and it will do it on all .log-files (at least on *nix, windows needs some "special threatment" for this, too)
SchlaWiener
+1  A: 

On a Debian system (and probably other linux distros) you could do this:

$ nl -w 3 -n rz -s ". " [filename] > [newfilename]
ylebre
Don't do the equivalent of "cat file > file". You'll likely lose all of your data depending on what order the shell and invoked program do things.
thanks for the suggestion, changed my example.
ylebre
+10  A: 

This is actually pretty easy in PowerShell:

function Number-Lines($name) {
    $i = 1
    Get-Content $name | ForEach-Object { "{0:000}. {1}" -f $i++,$_ }
}

What I'm doing here is getting the contents of the file, this will return a String[], over which I iterate with ForEach-Object and apply a format string using the -f operator. The result just drops out of the pipeline as another String[] which can be redirected to a file if needed.

You can shorten it a little by using aliases:

gc .\someFile.txt | %{ "{0:000}. {1}" -f $i++,$_ }

but I won't recommend that for a function definition.

You way want to consider using two passes, though and constructing the format string on the fly to accommodate for larger numbers of lines. If there are 1500 lines {0:000} it won't be sufficient anymore to get neatly aligned output.

As for which language is best for such tasks, you might look at factors such as

  • conciseness of code (Perl will be hard to beat there, especially that one-liner in another answer)
  • readability and maintainability of code
  • availability of the tools (Perl, Python or PowerShell aren't installed on Windows by default, so deployment might be hindered.)

In the light of the last point you might even be better off using cmd for this task. The code is similarly pretty simple:

@echo off
setlocal
set line=1
for /f "delims=" %%l in (%1) do call :process %%l
endlocal
goto :eof

:process
call :lz %line%
echo %lz%. %*
set /a line+=1
goto :eof

:lz
if %1 LSS 10 set lz=00%1&goto :eof
if %1 LSS 100 set lz=0%1&goto :eof
set lz=%1&goto :eof
goto :eof

That assumes, of course, that it has to run somewhere else than your own machine. If not, then use whatever fits your needs :-)

Joey
1. I've seen lots of window environments where perl was installed, and not by the user of the pc, where it comes from I don't know. <BR/>2. Can you give a cmd example too?
Peter
This is an outstanding answer (which I'll be bookmarking), but just wanted to point out that I think you've actually demonstrated that Powershell is usually the answer. It makes this kind of thing so easy that creating a function to wrap it is usually a waste of time. You could just put the initialization of $i inline to make it a one-liner. It's way more readable than Perl. And finally, PowerShell is an OS "update", which makes getting permission to install it a bit easier. Soon it will be installed on Window's OS's by default, starting with Windows 7 I believe.
Mike
@Peter Perl would be installed by some administrator on the machine. As Mike stated, PowerShell is installed and "on" by default in Win 7 and Server 2008 R2 and is available through Windows Update or WSUS as an optional update on Vista, XP, and Server 2003. It is a feature on Server 2008
Steven Murawski
+2  A: 

It isn't what you wanted, but please recall findstr.exe(and find.exe) at times...

findstr /n ".*" filename find "" /v /n filename

+1 for taking some time to add some insight on an old and already accepted question.
Peter