tags:

views:

208

answers:

9

Hi,

I have a set of 10000 files. In all of them, the second line, looks like:

AAA 3.429 3.84

so there is just one space (requirement) between AAA and the two other columns. The rest of lines on each file are completely different and correspond to 10 columns of numbers.

Randomly, in around 20% of the files, and due to some errors, one gets

BBB  3.429 3.84

so now there are two spaces between the first and second column.

This is a big error so I need to fix it, changing from 2 to 1 space in the files where the error takes place.

The first approach I thought of was to write a bash script that for each file reads the 3 values of the second line and then prints them with just one space, doing it for all the files.

I wonder what do oyu think about this approach and if you could suggest something better, bashm python or someother approach.

Thanks

+6  A: 

Use sed.

for file in *
do
  sed -i '' '2s/  / /' "$file"
done

The -i '' flag means to edit in-place without a backup.

Or use ed!

for file in *
do
  printf "2s/  / /\nwq\n" |ed -s "$file"
done
jleedev
+1 Wow, pretty cool.
systempuntoout
not true, depends on OS.
ghostdog74
For 10K files, `for file in *` might blow up (not will but might)
Chen Levy
+1 for using `ed`
Isaac
@Manos You should always be afraid of not using version control.
jleedev
+9  A: 

Performing line-based changes to text files is often simplest to do in sed.

sed -e '2s/  */ /g' infile.txt

will replace any runs of multiple spaces with a single space. This may be changing more than you want, though.

sed -e '2s/^\([^ ]*\)  /\1 /' infile.txt

should just replace instances of two spaces after the first block of space-free text with a single space (though I have not tested this).

(edit: inserted 2 before s in each instance to tie the edit to the second line, specifically.)

Isaac
+1  A: 

I don't quite understand, but yes, sed is an option. I don't think any POSIX compliant version of sed has an in file option (-i), so a fully POSIX compliant solution would be.

sed -e 's/^BBB  /BBB /' <file> > <newfile>
Anders
FreeBSD and GNU have the in-place option; OpenBSD does not. You learn something every day.
jleedev
You don't need the `cat`; `sed` can take an input file as a parameter: `sed -e 's/^BBB /BBB /' <file> > <newfile>`
Mike DeSimone
Per http://www.opengroup.org/onlinepubs/009695399/utilities/sed.html, I believe POSIX sed supports input file(s) as last arguments, but is not required to implement in-place editing (I was not sure to which of these issues you were referring).
Isaac
Updated it, thanks for the information Mike.
Anders
+1  A: 

This answer assumes you don't want to mess with any except the second line.

#!/usr/bin/env python
import sys, os
for fname in sys.argv[1:]:
    with open(fname, "r") as fin:
        line1 = fin.readline()
        line2 = fin.readline()
        fixedLine2 = " ".join(line2.split()) + '\n'
        if fixedLine2 == line2:
            continue
        with open(fname + ".fixed", "w") as fout:
            fout.write(line1)
            fout.write(line2)
            for line in fin:
                fout.write(line)
    # Enable these lines if you want the old files replaced with the new ones.
    #os.remove(fname)
    #os.rename(fname + ".fixed", fname)
Mike DeSimone
Recommended practice is to use the `with` statement for this kind of thing to be sure files are properly closed.
S.Lott
Neat. How long has `with` been available? I never got in the habit because I remember reading (somewhere on SO as well) that `with` was a great way to hide bugs via name punning or something. Kind of like how `from ___ import *` is discouraged.
Mike DeSimone
@Mike "With" has been available since 2.5
prestomation
@mike, `with` is available Python 2.5 onwards. for <2.5, use the normal `open()` and `close()`. to make your code workable in older versions, just use the standard open,close
ghostdog74
Answering my own comment: `with` is available by default in Python 2.6 and later, and in Python 2.5 if a `with_statement` feature is enabled. Since I have to write code that runs on 2.4 (thanks, RHEL), I never used it.
Mike DeSimone
@Mike DeSimone: I didn't make this change, because it's not as important, but you may want to avoid `os.remove` and instead use `os.rename( fname, fname+'.bak' )`. That gives a handy rollback strategy in the unlikely event of a problem.
S.Lott
`os.remove` only needed to be there for Windows, which will toss `OSError` if you try to rename to something that exists. Also, we'll have to differ in strategy: I prefer to not mess with the originals at all (hence the disabled lines) until I'm sure things work (in this case, by looking at the ".fixed" files, then deleting them or `mv`'ing them with `bash`).
Mike DeSimone
For the record, our RHEL (4) actually uses Python 2.3.4 (*sobs). I've given up developing for it and demand Fedora these days... but there I'm limited to FC 9 because they broke NIS support in the later releases. >_<
Mike DeSimone
+1  A: 

Use sed:

sed -e 's/[[:space:]][[:space:]]/ /g' yourfile.txt >> newfile.txt

This will replace any two adjacent spaces with one. The use of [[:space:]] just makes it a little bit clearer

dirk
+4  A: 

if the error always can occur at 2nd line,

for file in file*
do
    awk 'NR==2{$1=$1}1' file >temp
    mv temp "$file"    
done

or sed

sed -i.bak '2s/  */ /' file* # do 2nd line

Or just pure bash scripting

i=1
while read -r line
do
  if [ "$i" -eq 2 ];then
    echo $line
  else
    echo "$line"
  fi
  ((i++))
done <"file"
ghostdog74
What does the trailing `1` in the `awk` script do?
Mike DeSimone
its short cut for "{print}"
ghostdog74
+2  A: 

I am going to be different and go with AWK:

awk '{print $1,$2,$3}' file.txt > file1.txt

This will handle any number of spaces between fields, and replace them with one space

To handle a specific line you can add line addresses:

awk 'NR==2{print $1,$2,$3} NR!=2{print $0}' file.txt > file1.txt

i.e. rewrite line 2, but leave unchanged the other lines.

A line address can be a regular expression as well:

awk '/regexp/{print $1,$2,$3} !/regexp/{print}' file.txt > file1.txt
Dan Andreatta
Can you make it only change line two?
Mike DeSimone
put `NR==2`. see my answer
ghostdog74
Edited my answer.
Dan Andreatta
+2  A: 

Since it seems every column is separated by one space, another approach not yet mentioned is to use tr to squeeze all multi spaces into single spaces:
tr -s " " < infile > outfile

frankc
A: 
sed -i -e '2s/  / /g' input.txt

-i: edit files in place