ansaurus

Question

Is there a simple way to do bulk file text substitution in place?

Answer 1

+3 A:

You could use find:

find . -name '*.{cs,aspx,ascx}' | xargs perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi"

This will list all the filenames recursively, then xargs will read its stdin and run the remainder of the command line with the filenames appended on the end. One nice thing about xargs is it will run the command line more than once if the command line it builds gets too long to run in one go.

Note that I'm not sure whether find completely understands all the shell methods of selecting files, so if the above doesn't work then perhaps try:

find . | grep -E '(cs|aspx|ascx)$' | xargs ...

When using pipelines like this, I like to build up the command line and run each part individually before proceeding, to make sure each program is getting the input it wants. So you could run the part without xargs first to check it.

It just occurred to me that although you didn't say so, you're probably on Windows due to the file suffixes you're looking for. In that case, the above pipeline could be run using Cygwin. It's possible to write a Perl script to do the same thing, as you started to do, but you'll have to do the in-place editing yourself because you can't take advantage of the -i switch in that situation.

Greg Hewgill 2008-10-29 22:35:36

Tried find . -name '*.{cs,aspx,ascx}'with no luck, but the grep version listed the files. Nice!But when I run all commands I get this: xargs: perl: Argument list too long

Seiti 2008-10-29 23:01:04

xargs can also limit the number of arguments passed on each command line, if it can't determine the maximum length of the command line. Use the -L or -n option to xargs depending on which version it is (see the man page).

Greg Hewgill 2008-10-29 23:03:02

Schwern 2008-10-30 00:14:22

Answer 2

+4 A:

Change

foreach my $f (@files){
    if ($f =~ s/thisgoesout/thisgoesin/gi) {
           #inplace file editing, or something like that
    }
}

To

foreach my $f (@files){
    open my $in, '<', $f;
    open my $out, '>', "$f.out";
    while (my $line = <$in>){
        chomp $line;
        $line =~ s/thisgoesout/thisgoesin/gi
        print $out "$line\n";
    }
}

This assumes that the pattern doesn't span multiple lines. If the pattern might span lines, you'll need to slurp in the file contents. ("slurp" is a pretty common Perl term).

The chomp isn't actually necessary, I've just been bitten by lines that weren't chomped one too many times (if you drop the chomp, change print $out "$line\n"; to print $out $line;).

Likewise, you can change open my $out, '>', "$f.out"; to open my $out, '>', undef; to open a temporary file and then copy that file back over the original when the substitution's done. In fact, and especially if you slurp in the whole file, you can simply make the substitution in memory and then write over the original file. But I've made enough mistakes doing that that I always write to a new file, and verify the contents.

Note, I originally had an if statement in that code. That was most likely wrong. That would have only copied over lines that matched the regular expression "thisgoesout" (replacing it with "thisgoesin" of course) while silently gobbling up the rest.

Max Lybbert 2008-10-29 23:19:08

Answer 3

+7 A:

You may be interested in File::Transaction::Atomic or File::Transaction

The SYNOPSIS for F::T::A looks very similar with what you're trying to do:

  # In this example, we wish to replace 
  # the word 'foo' with the word 'bar' in several files, 
  # with no risk of ending up with the replacement done 
  # in some files but not in others.

  use File::Transaction::Atomic;

  my $ft = File::Transaction::Atomic->new;

  eval {
      foreach my $file (@list_of_file_names) {
          $ft->linewise_rewrite($file, sub {
               s#\bfoo\b#bar#g;
          });
      }
  };

  if ($@) {
      $ft->revert;
      die "update aborted: $@";
  }
  else {
      $ft->commit;
  }

Couple that with the File::Find you've already written, and you should be good to go.

Robert Krimen 2008-10-29 23:19:24

Answer 4

+5 A:

You can use Tie::File to scalably access large files and change them in place. See the manpage (man 3perl Tie::File).

Svante 2008-10-29 23:28:14

Why point them to man(3perl) instead of Perldoc?

ephemient 2008-10-29 23:47:10

Yes, Tie::File was created for just this sort of thing.

Schwern 2008-10-30 00:17:51

http://perldoc.perl.org/Tie/File.html

Brad Gilbert 2008-10-30 04:22:56

Answer 5

+12 A:

If you assign @ARGV before using *ARGV (aka the diamond <>), $^I/-i will work on those files instead of what was specified on the command line.

use File::Find::Rule;
use strict;

@ARGV = (File::Find::Rule->file()->name('*.cs', '*.aspx', '*.ascx')->in('.'));
$^I = '.bak';  # or set `-i` in the #! line or on the command-line

while (<>) {
    s/thisgoesout/thisgoesin/gi;
    print;
}

This should do exactly what you want.

If your pattern can span multiple lines, add in a undef $/; before the <> so that Perl operates on a whole file at a time instead of line-by-line.

ephemient 2008-10-29 23:46:25

Exactly what I needed!

Seiti 2008-10-30 01:58:05

Answer 6

+1 A:

Thanks to ephemient on this question and on this answer, I got this:

use File::Find::Rule;
use strict;

sub ReplaceText {
    my $regex = shift;
    my $replace = shift;

    @ARGV = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
    $^I = '.bak';
    while (<>) {
        s/$regex/$replace->()/gie;
        print;
    }
}

ReplaceText qr/some(crazy)regexp/, sub { "some $1 text" };

Now I can even loop throught a hash containing regexp=>subs entries !

Seiti 2008-10-30 23:30:56

You should probably `local`ize `@ARGV` and `$^I` within this routine, as these variables have rather global effects.

ephemient 2009-07-08 21:54:15

ansaurus

tags:

views:

answers:

Is there a simple way to do bulk file text substitution in place?

related questions