views:

115

answers:

4

I have a 200 Mb text file and for every line need to swap the 3rd and 4th characters with the 6th and 7th characters, so that

1234567890

would become

1267534890

I am using Windows XP with PowerShell installed. Also installed is Cygwin and UnxUtils so have access to versions of cut, sed, awk, grep, etc. There is no delimiter in the file, BTW.

Any suggestions would be appreciated.

Thanks!

A: 

The naïve way:

Get-Content .\test.txt |
    ForEach-Object { [string]::Concat(
                         $_.Substring(0,2),
                         $_.Substring(5,2),
                         $_.Substring(4,1),
                         $_.Substring(2,2),
                         $_.Substring(7)) } |
    Out-File new.txt

Not very nice, though and probably pretty slow for 200 MiB.

Since you never change any lengths and only shift bytes around you can very likely do this in-place as well. I'll look whether that works well.

Joey
+3  A: 

For each line use sed to do a find/replace:


    sed -e 's/^\(..\)\(..\)\(.\)\(..\)\(.*\)$/\1\4\3\2\5/g'
tttppp
Fantastic... worked a treat... thank you! I love simple one-liners like this using sed, but didn't know enough about back-references to have a stab at it like this.For other Windows users, replace the apostrophes with quotation marks, and tack on the end the line "input file" > "ouput file". Otherwise, works as-is. Cheers!
Aidan Whitehall
Nice. You don't need the `\(.*\)$` and `\5` parts, for further simplification.
Alok
All good points :)
tttppp
If you haven't found it yourself have a look at grepWin, it's a great utility for grepping on Windows systems. Sure you can use sed and awk within Cygwin, I've been doing that for years, but I'm using grepWin more and more since stumbled across it. No, I am not the author or the author's partner, it's just that I think it's a great utility.
High Performance Mark
+1  A: 

since you have cygwin and awk

{
 tf = substr($0,3,2)
 ss = substr($0,6,2)
 print substr($0,1,2) ss substr($0,5,1) tf substr($0,8)
} 

save the above as myscript.awk and on windows command line

c:\test> awk -f myscript.awk file 

if you like using windows tools, you can use vbscript as alternative

Set objFS=CreateObject("Scripting.FileSystemObject")
Set objArgs = WScript.Arguments
strFile = objArgs(0)
Set objFile =objFS.OpenTextFile(strFile)
Do Until objFile.AtEndOfLine
    strLine = objFile.ReadLine
    tf = Mid(strLine,3,2)
    ss = Mid(strLine,6,2)       
    WScript.Echo Mid(strLine,1,2) & ss & Mid(strLine,5,1) & tf & Mid(strLine,8)
Loop

save the above as myscript.vbs and on command line

c:\test> cscript //nologo myscript.vbs file
ghostdog74
Haven't tried the VBScript, but the awk script works a treat -- thank you :)
Aidan Whitehall
A: 

For very long input files I would prefer this because there's no regexp match and will be much faster

awk -F '' '{ print $1 $2 $6 $7 $5 $3 $4 $8 $9 $10 }'
dtmilano