ansaurus

Question

Shell script - search and replace text in multiple files using a list of strings

Answer 1

+5 A:

Here are the basic steps I would do

Copy the changesDictionary.txt file
In it replace "a"="b" to the equivalent sed line: e.g. (use $1 for the file name)

sed -e 's/a/b/g' $1

(you could write a script to do this or just do it by hand, if you just need to do this once and it's not too big).
If the files are all in one directory, then you can do something like:

ls *.txt | xargs scriptFromStep2.sh
If they are in subdirs, use a find to call that script on all of the files, something like

find . -name '*.txt' -exec scriptFromStep2.sh {} \;

These aren't exact, do some experiments to make sure you get it right -- it's just the approach I would use.

(but, if you can, just use perl, it would be a lot simpler)

Lou Franco 2009-03-16 00:31:20

Thank you for the help, Lou. I should point out that bash is not at all essential. The target platform is Mac OS X, so anything I can run from the command line that will do the job, is fine. I'm a neophyte when it comes to bash, so perl (or anything else) is actually preferable.

SirRatty 2009-03-16 00:34:14

Oh, I should add that the files for processing will definitely be in nested subdirectories.

SirRatty 2009-03-16 00:36:26

Yet Another Thing: the format of the dictionary file is unimportant. I can change it to whatever is needed.

SirRatty 2009-03-16 00:39:06

Running `echo a | sed '%s/a/b/g'` gives "-e expression #1, char 1: unknown command: `%'". Typo or some extension of sed?

ashawley 2009-03-16 20:23:27

get rid of the % (that was a mistake -- from how ed is used in vim)

Lou Franco 2009-03-17 20:28:47

I edited the post to remove it :)

Matt J 2009-08-06 14:00:06

Upvoted for teaching me about xargs. Saved me a lot of work.

Randaltor 2009-11-20 17:25:00

OK, I'm dense. Sed is a filter, this won't change the files in place. There must be a handy unix construct for this.

Charles Merriam 2010-07-23 23:09:48

Answer 2

+4 A:

I'd convert your changesDictionary.txt file to a sed script, with... sed:

$ sed -e 's/^"\(.*\)" = "\(.*\)"$/s\/\1\/\2\/g/' \
      changesDictionary.txt  > changesDictionary.sed

Note, any special characters for either regular expressions or sed expressions in your dictionary will be falsely interpreted by sed, so your dictionary can either only have only the most primitive search-and-replacements, or you'll need to maintain the sed file with valid expressions. Unfortunately, there's no easy way in sed to either shut off regular expression and use only string matching or quote your searches and replacements as "literals".

With the resulting sed script, use find and xargs -- rather than find -exec -- to convert your files with the sed script as quickly as possible, by processing them more than one at a time.

$ find somedir -type f -print0 \
   | xargs -0 sed -i -f changesDictionary.sed

Note, the -i option of sed edits files "in-place", so be sure to make backups for safety, or use -i~ to create tilde-backups.

Final note, using search and replaces can have unintended consequences. Will you have searches that are substrings of other searches? Here's an example.

$ echo '"fix" = "broken"' > changesDictionary.txt
$ echo '"fixThat" = "Fixed"' >> changesDictionary.txt
$ sed -e 's/^"\(.*\)" = "\(.*\)"$/s\/\1\/\2\/g/' changesDictionary.txt  \
   | tee changesDictionary.sed
s/fix/broken/g
s/fixThat/Fixed/g
$ mkdir subdir
$ echo fixThat > subdir/target.txt
$ find subdir -type f -name '*.txt' -print0 \
   | xargs -0 sed -i -f changesDictionary.sed
$ cat subdir/target.txt
brokenThat

Should "fixThat" have become "Fixed" or "brokenThat"? Order matters for changesDictionary.sed. Similarly, a search and replace can be search and replaced more than once -- changing "a" to "b", may be changed by another search-and-replace from "b" to "c".

Perhaps you've already considered both of these, but I mention because I've tried what you were doing before and didn't think of it. I don't know of anything that simply does the right thing for doing multiple search and replacements at once. So, you need to program it to do the right thing yourself. (Perhaps another Stackoverflow post exists?)

Last and final note, I don't use Mac OS X, but the shell commands work on my GNU/Linux machine. I'm not sure what your experience is with find, xargs or sed is, but I recall Mac OS X had older or different builds of these tools. So, you may be able to accomplish this, but it may need small modification.

ashawley 2009-03-16 19:07:59

Thank you for this help! I'm still learning shell scripting so will go through it step by step and report back.

SirRatty 2009-03-16 22:59:56

Answer 3

+1 A:

#!/bin/bash
f="changesDictionary.tx"
find /path -type f -name "*.txt" | while read FILE 
do
    awk 'BEGIN{ FS="=" }
    FNR==NR{ s[$1]=$2;  next }
    {
       for(i in s){      
        if( $0 ~ i ){ gsub(i,s[i]) }
       }
       print $0
    }' $f $FILE  > temp
    mv temp $FILE
done

ghostdog74 2009-08-06 14:22:15

Answer 4

A:

A question regarding ghostdog's excellent solution:

In the snippet:

   for(i in s){      
    if( $0 ~ i ){ gsub(i,s[i]) }
   }

I want to pick of the line # of i in the key-value file. This doesn't work:

   for(i in s){
    line=0;     
    if( $0 ~ i ){ gsub(i,s[i]) }
    line=line+1;
   }

The line # is incorrect. Seems that the for iterator does not start at the beginning of key-file.

Suggestions?

Bill Ross 2010-03-25 22:23:18

ansaurus

tags:

views:

answers:

Shell script - search and replace text in multiple files using a list of strings

related questions