views:

9092

answers:

4

[edited]

I have a file "changesDictionary.txt" containing (a variable number of) pairs of key-value strings.

e.g.

"textToSearchFor" = "theReplacementText"

(The format of the dictionary is unimportant, and be changed as required.)

I need to iterate through the contents of a given directory, including sub-directories. For each file encountered with the extension ".txt", we search for each of the keys in changesDictionary.txt, replacing each found instance with the replacement string value.

i.e. a search and replace over multiple files, but using a list of search/replace terms rather than a single search/replace term.

How could I do this? (I have studied single search/replace examples, but do not understand how to do multiple searches within a file.)

The implementation (bash, perl, whatever) is not important as long as I can run it from the command line in Mac OS X. Thanks for any help.

+5  A: 

Here are the basic steps I would do

  1. Copy the changesDictionary.txt file
  2. In it replace "a"="b" to the equivalent sed line: e.g. (use $1 for the file name)

    sed -e 's/a/b/g' $1

    (you could write a script to do this or just do it by hand, if you just need to do this once and it's not too big).

  3. If the files are all in one directory, then you can do something like:

    ls *.txt | xargs scriptFromStep2.sh

  4. If they are in subdirs, use a find to call that script on all of the files, something like

    find . -name '*.txt' -exec scriptFromStep2.sh {} \;

These aren't exact, do some experiments to make sure you get it right -- it's just the approach I would use.

(but, if you can, just use perl, it would be a lot simpler)

Lou Franco
Thank you for the help, Lou. I should point out that bash is not at all essential. The target platform is Mac OS X, so anything I can run from the command line that will do the job, is fine. I'm a neophyte when it comes to bash, so perl (or anything else) is actually preferable.
SirRatty
Oh, I should add that the files for processing will definitely be in nested subdirectories.
SirRatty
Yet Another Thing: the format of the dictionary file is unimportant. I can change it to whatever is needed.
SirRatty
Running `echo a | sed '%s/a/b/g'` gives "-e expression #1, char 1: unknown command: `%'". Typo or some extension of sed?
ashawley
get rid of the % (that was a mistake -- from how ed is used in vim)
Lou Franco
I edited the post to remove it :)
Matt J
Upvoted for teaching me about xargs. Saved me a lot of work.
Randaltor
OK, I'm dense. Sed is a filter, this won't change the files in place. There must be a handy unix construct for this.
Charles Merriam
+4  A: 

I'd convert your changesDictionary.txt file to a sed script, with... sed:

$ sed -e 's/^"\(.*\)" = "\(.*\)"$/s\/\1\/\2\/g/' \
      changesDictionary.txt  > changesDictionary.sed

Note, any special characters for either regular expressions or sed expressions in your dictionary will be falsely interpreted by sed, so your dictionary can either only have only the most primitive search-and-replacements, or you'll need to maintain the sed file with valid expressions. Unfortunately, there's no easy way in sed to either shut off regular expression and use only string matching or quote your searches and replacements as "literals".

With the resulting sed script, use find and xargs -- rather than find -exec -- to convert your files with the sed script as quickly as possible, by processing them more than one at a time.

$ find somedir -type f -print0 \
   | xargs -0 sed -i -f changesDictionary.sed

Note, the -i option of sed edits files "in-place", so be sure to make backups for safety, or use -i~ to create tilde-backups.

Final note, using search and replaces can have unintended consequences. Will you have searches that are substrings of other searches? Here's an example.

$ echo '"fix" = "broken"' > changesDictionary.txt
$ echo '"fixThat" = "Fixed"' >> changesDictionary.txt
$ sed -e 's/^"\(.*\)" = "\(.*\)"$/s\/\1\/\2\/g/' changesDictionary.txt  \
   | tee changesDictionary.sed
s/fix/broken/g
s/fixThat/Fixed/g
$ mkdir subdir
$ echo fixThat > subdir/target.txt
$ find subdir -type f -name '*.txt' -print0 \
   | xargs -0 sed -i -f changesDictionary.sed
$ cat subdir/target.txt
brokenThat

Should "fixThat" have become "Fixed" or "brokenThat"? Order matters for changesDictionary.sed. Similarly, a search and replace can be search and replaced more than once -- changing "a" to "b", may be changed by another search-and-replace from "b" to "c".

Perhaps you've already considered both of these, but I mention because I've tried what you were doing before and didn't think of it. I don't know of anything that simply does the right thing for doing multiple search and replacements at once. So, you need to program it to do the right thing yourself. (Perhaps another Stackoverflow post exists?)

Last and final note, I don't use Mac OS X, but the shell commands work on my GNU/Linux machine. I'm not sure what your experience is with find, xargs or sed is, but I recall Mac OS X had older or different builds of these tools. So, you may be able to accomplish this, but it may need small modification.

ashawley
Thank you for this help! I'm still learning shell scripting so will go through it step by step and report back.
SirRatty
+1  A: 
#!/bin/bash
f="changesDictionary.tx"
find /path -type f -name "*.txt" | while read FILE 
do
    awk 'BEGIN{ FS="=" }
    FNR==NR{ s[$1]=$2;  next }
    {
       for(i in s){      
        if( $0 ~ i ){ gsub(i,s[i]) }
       }
       print $0
    }' $f $FILE  > temp
    mv temp $FILE
done
ghostdog74
A: 

A question regarding ghostdog's excellent solution:

In the snippet:

   for(i in s){      
    if( $0 ~ i ){ gsub(i,s[i]) }
   }

I want to pick of the line # of i in the key-value file. This doesn't work:

   for(i in s){
    line=0;     
    if( $0 ~ i ){ gsub(i,s[i]) }
    line=line+1;
   }

The line # is incorrect. Seems that the for iterator does not start at the beginning of key-file.

Suggestions?

Bill Ross