ansaurus

Question

Bash or python for changing spacing in files

Answer 1

+6 A:

Use sed.

for file in *
do
  sed -i '' '2s/  / /' "$file"
done

The -i '' flag means to edit in-place without a backup.

Or use ed!

for file in *
do
  printf "2s/  / /\nwq\n" |ed -s "$file"
done

jleedev 2010-03-23 13:52:30

+1 Wow, pretty cool.

systempuntoout 2010-03-23 13:55:59

not true, depends on OS.

ghostdog74 2010-03-23 14:07:43

For 10K files, `for file in *` might blow up (not will but might)

Chen Levy 2010-03-23 14:25:16

+1 for using `ed`

Isaac 2010-03-23 14:25:45

@Manos You should always be afraid of not using version control.

jleedev 2010-03-23 14:27:51

Answer 2

+9 A:

Performing line-based changes to text files is often simplest to do in sed.

sed -e '2s/  */ /g' infile.txt

will replace any runs of multiple spaces with a single space. This may be changing more than you want, though.

sed -e '2s/^\([^ ]*\)  /\1 /' infile.txt

should just replace instances of two spaces after the first block of space-free text with a single space (though I have not tested this).

(edit: inserted 2 before s in each instance to tie the edit to the second line, specifically.)

Isaac 2010-03-23 13:55:44

Answer 3

+1 A:

I don't quite understand, but yes, sed is an option. I don't think any POSIX compliant version of sed has an in file option (-i), so a fully POSIX compliant solution would be.

sed -e 's/^BBB  /BBB /' <file> > <newfile>

Anders 2010-03-23 13:56:44

FreeBSD and GNU have the in-place option; OpenBSD does not. You learn something every day.

jleedev 2010-03-23 14:00:09

You don't need the `cat`; `sed` can take an input file as a parameter: `sed -e 's/^BBB /BBB /' <file> > <newfile>`

Mike DeSimone 2010-03-23 14:02:50

Per http://www.opengroup.org/onlinepubs/009695399/utilities/sed.html, I believe POSIX sed supports input file(s) as last arguments, but is not required to implement in-place editing (I was not sure to which of these issues you were referring).

Isaac 2010-03-23 14:03:40

Updated it, thanks for the information Mike.

Anders 2010-03-23 14:07:43

Answer 4

+1 A:

This answer assumes you don't want to mess with any except the second line.

#!/usr/bin/env python
import sys, os
for fname in sys.argv[1:]:
    with open(fname, "r") as fin:
        line1 = fin.readline()
        line2 = fin.readline()
        fixedLine2 = " ".join(line2.split()) + '\n'
        if fixedLine2 == line2:
            continue
        with open(fname + ".fixed", "w") as fout:
            fout.write(line1)
            fout.write(line2)
            for line in fin:
                fout.write(line)
    # Enable these lines if you want the old files replaced with the new ones.
    #os.remove(fname)
    #os.rename(fname + ".fixed", fname)

Mike DeSimone 2010-03-23 13:59:10

Recommended practice is to use the `with` statement for this kind of thing to be sure files are properly closed.

S.Lott 2010-03-23 14:02:34

Neat. How long has `with` been available? I never got in the habit because I remember reading (somewhere on SO as well) that `with` was a great way to hide bugs via name punning or something. Kind of like how `from ___ import *` is discouraged.

Mike DeSimone 2010-03-23 14:10:17

@Mike "With" has been available since 2.5

prestomation 2010-03-23 14:18:11

@mike, `with` is available Python 2.5 onwards. for <2.5, use the normal `open()` and `close()`. to make your code workable in older versions, just use the standard open,close

ghostdog74 2010-03-23 14:19:05

Answering my own comment: `with` is available by default in Python 2.6 and later, and in Python 2.5 if a `with_statement` feature is enabled. Since I have to write code that runs on 2.4 (thanks, RHEL), I never used it.

Mike DeSimone 2010-03-23 14:20:46

@Mike DeSimone: I didn't make this change, because it's not as important, but you may want to avoid `os.remove` and instead use `os.rename( fname, fname+'.bak' )`. That gives a handy rollback strategy in the unlikely event of a problem.

S.Lott 2010-03-23 14:37:04

`os.remove` only needed to be there for Windows, which will toss `OSError` if you try to rename to something that exists. Also, we'll have to differ in strategy: I prefer to not mess with the originals at all (hence the disabled lines) until I'm sure things work (in this case, by looking at the ".fixed" files, then deleting them or `mv`'ing them with `bash`).

Mike DeSimone 2010-03-23 15:49:55

For the record, our RHEL (4) actually uses Python 2.3.4 (*sobs). I've given up developing for it and demand Fedora these days... but there I'm limited to FC 9 because they broke NIS support in the later releases. >_<

Mike DeSimone 2010-03-24 04:04:36

Answer 5

+1 A:

Use sed:

sed -e 's/[[:space:]][[:space:]]/ /g' yourfile.txt >> newfile.txt

This will replace any two adjacent spaces with one. The use of [[:space:]] just makes it a little bit clearer

dirk 2010-03-23 14:02:13

Answer 6

+4 A:

if the error always can occur at 2nd line,

for file in file*
do
    awk 'NR==2{$1=$1}1' file >temp
    mv temp "$file"    
done

or sed

sed -i.bak '2s/  */ /' file* # do 2nd line

Or just pure bash scripting

i=1
while read -r line
do
  if [ "$i" -eq 2 ];then
    echo $line
  else
    echo "$line"
  fi
  ((i++))
done <"file"

ghostdog74 2010-03-23 14:04:39

What does the trailing `1` in the `awk` script do?

Mike DeSimone 2010-03-23 14:13:59

its short cut for "{print}"

ghostdog74 2010-03-23 14:14:43

Answer 7

+2 A:

I am going to be different and go with AWK:

awk '{print $1,$2,$3}' file.txt > file1.txt

This will handle any number of spaces between fields, and replace them with one space

To handle a specific line you can add line addresses:

awk 'NR==2{print $1,$2,$3} NR!=2{print $0}' file.txt > file1.txt

i.e. rewrite line 2, but leave unchanged the other lines.

A line address can be a regular expression as well:

awk '/regexp/{print $1,$2,$3} !/regexp/{print}' file.txt > file1.txt

Dan Andreatta 2010-03-23 14:07:10

Can you make it only change line two?

Mike DeSimone 2010-03-23 14:13:12

put `NR==2`. see my answer

ghostdog74 2010-03-23 14:17:23

Edited my answer.

Dan Andreatta 2010-03-23 19:44:41

Answer 8

+2 A:

Since it seems every column is separated by one space, another approach not yet mentioned is to use tr to squeeze all multi spaces into single spaces:
tr -s " " < infile > outfile

frankc 2010-03-23 16:08:35

Answer 9

A:

sed -i -e '2s/  / /g' input.txt

-i: edit files in place

2010-03-24 09:40:08

ansaurus

tags:

views:

answers:

Bash or python for changing spacing in files

related questions