ansaurus

Question

How can I remove text at beginning of a file using a regex?

Answer 1

A:

Here you go! This replaces the first line of the file:


use Tie::File;

tie my @array,"Tie::File","path_to_file" or die("can't tie the file");
$array[0] =~s/text_i_want_to_replace/replacement_text/gi;
untie @array;

You can operate on the array and you will see the modifications in the array. You can delete elements from the array and it will erase the line from the file. Applying substitution on elements will substitute text from the lines.

If you want to delete the first two lines, and keep something from the third, you can do something like this :


# tie the @array before this
shift @array;
shift @array;
$array[0]=~s/foo bar\.\.\.//gi;
# untie the @array

and this will do exactly what you need!

Geo 2009-03-23 20:29:20

Answer 2

+6 A:

By default, ARGV (aka <> which is used behind-the-scenes by -p) only reads a single line at a time.

Workarounds:

Unset $/, which tells Perl to read a whole file at a time.
```
perl -pi -e "BEGIN{undef$/}s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt
```
BEGIN is necessary to have that code run before the first read is done.

Use -0, which sets $/ = "\0".

perl -pi -0 -e "s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt

Take advantage of the flip-flop operator.
```
perl -ni -e "print unless 1 ... /^Foo.bar/'
```
This will skip printing starting from line 1 to /^Foo.bar/.

ephemient 2009-03-23 20:42:27

-0 isn't as safe as -0777 which is guaranteed to put perl into slurp mode.

Chas. Owens 2009-03-23 20:50:44

It's only unsafe on binary data. One would hope that *.txt are actually text files.

ephemient 2009-03-23 20:56:37

Tried all three. Last one: perl -ni -e "print unless 1 ... /^Project.Gutenberg/" 00ws110.txt - still doesn't work tho. It prints nothing.

GeoffreyF67 2009-03-23 21:37:02

Yes, but you never know when a stray null may wind up in a supposed text file, why take the chance when you can hit 7 three times and be safe.

Chas. Owens 2009-03-23 21:43:01

Last one works for me, on three different Perl installations. Are you sure the `//` matches on the last line of the header?

ephemient 2009-03-23 22:02:35

Whew. Finally got it. Thanks!

GeoffreyF67 2009-03-23 22:20:37

Answer 3

+3 A:

If your header stretches across more than one line you must tell perl how much to read. If the files are small in comparison to memory you may want to just slurp the whole file into memory:

perl -0777pi.orig -e 's/your regex/your replace/s' file1 file2 file3

The -0777 option sets perl to slurp mode, so $_ will hold the each whole file each time through the loop. Also, always remember to set the backup extension. If you don't you may find that you have wiped out your data accidentally and have no way to get it back. See perldoc perlrun for more information.

Given information from the comments, it looks like you are trying to strip all of the annoying stuff from the front of a Project Gutenberg ebook. If you understand all of the copyright issues involved, you should be able to get rid of the front matter like this:

perl -ni.orig -e 'print unless 1 .. /^\*END/' 00ws110.txt

The Project Gutenberg header ends with

*END*THE SMALL PRINT! FOR PUBLIC DOMAIN ETEXTS*Ver.04.29.93*END*

A safer regex would take into account the *END* at the end of the line as well, but I am lazy.

Chas. Owens 2009-03-23 20:49:20

Answer 4

+2 A:

I might be misinterpreting what you're asking for, but it looks to me that simple:

perl -ni -e 'print unless 1..($. > 1 && /^Foo bar/)'

depesz 2009-03-23 21:19:35

Or just use `1.../^Foo bar/` (notice: triple dot, not double) instead of testing `$.`.

ephemient 2009-03-23 21:30:11

Answer 5

A:

I'm looking for the opposite. I want to pipe the output of a linux command into a perl command line that will only display information from the beginning of the output until the first occurrence of a regular expression. Any ideas?

Pete 2009-12-22 13:21:32

ansaurus

tags:

views:

answers:

How can I remove text at beginning of a file using a regex?

related questions