tags:

views:

4135

answers:

9

I have a string which starts with //#... goes upto the newline characater. I have figured out the regex for the which is this ..#([^\n]*).

My question is how do you remove this line from a file if the following condition matches

A: 

Read the file line by line and only write those lines to a new file that don't match the regex. You cannot just remove a line.

EricSchaefer
A: 

Does it start at the begining of a line or can it appear anywhere? If the former s/old/new is what you want. If the latter, I'll have to figure that out. I suspect that back referances could be used somehow.

docgnome
A: 

I don't think your regex is correct.

First you need to start with ^ or else it will match this pattern anywhere on the line.

Second, the .. should be \/\/ or else it will match any two characters.

^\/\/#[^\n]* is probably what you want.

Then do what EricSchaefer says and read the file line by line only writing lines that don't match.

--
bmb

bmb
A: 

Try the following:

perl -ne 'print unless m{^//#}' input.txt > output.txt

If you are using windows you need double quotes instead of single quotes.

You can do the same with grep

grep -v -e '^//#' input.txt > output.txt
Pat
A: 

Iterate over each line in the file, and skip the line if it matches the pattern:

my $fh = new FileHandle 'filename'
    or die "Failed to open file - $!";

while (my $line = $fh->getline) {
    next if $line =~ m{^//#};
    print $line;
}
close $fh;

This will print all lines from the file, except the line that starts with '//#'.

David Precious
+1  A: 

You really don't need perl for this.

sed '/^\/\/#/d' inputfile > outputfile

I <3 sed.

Aeon
+2  A: 

To filter out all the lines in a file that match a certain regex:

perl -n -i.orig -e 'print unless /^#/' file1 file2 file3

The '.orig' after the -i switch creates a backup of the file with the given extension (.orig). You can skip it if you don't need a backup (just use -i).

The -n switch causes perl to execute your instructions (-e ' ... ') for each line in the file. The line is stored in $_ (which is also the default argument for many instructions, in this case: print and regex matching).

Finally, the argument to the -e switch says "print the line unless it matches a # character at the start of the line.

PS. There is also a -p switch which behaves like -n, except the lines are always printed (good for searching and replacing)

kixx
+14  A: 

Your regex is badly chosen on several points:

  1. Instead of matching two slashes specifically, you use .. to match two characters that can be anything at all, presumably because you don’t know how to match slashes when you’re also using them as delimiters. (Actually, dots match almost anything, as we’ll see in #3.)

    Within a slash-delimited regex literal, //, you can match slashes simply by protecting them with backslashes, eg. /\/\//. The nicer variant, however, is to use the longer form of regex literal, m//, where you can choose the delimiter, eg. m!!. Since you use something other than slashes for delimitation, you can then write them without escaping them: m!//!. See perldoc perlop.

  2. It’s not anchored to the start of the string so it will match anywhere. Use the ^ start-of-string assertion in front.

  3. You wrote [^\n] to match “any character except newline” when there is a much simpler way to write that, which is just the . wildcard. It does exactly that – match any character except newline.

  4. You are using parentheses to group a part of the match, but the group is neither quantified (you are not specifying that it can match any other number of times than exactly once) nor are you interested in keeping it. So the parentheses are superfluous.

Altogether, that makes it m!^//#.*!. But putting an uncaptured .* (or anything with a * quantifier) at the end of a regex is meaningless, since it never changes whether a string will match or not: the * is happy to match nothing at all.

So that leaves you with m!^//#!.

As for removing the line from the file, as everyone else explained, read it in line by line and print all the lines you want to keep back to another file. If you are not doing this within a larger program, use perl’s command line switches to do it easily:

perl -ni.bak -e'print unless m!^//#!' somefile.txt

Here, the -n switch makes perl put a loop around the code you provide which will read all the files you pass on the command line in sequence. The -i switch (for “in-place”) says to collect the output from your script and overwrite the original contents of each file with it. The .bak parameter to the -i option tells perl to keep a backup of the original file in a file named after the original file name with .bak appended. For all of these bits, see perldoc perlrun.

If you want to do this within the context of a larger program, the easiest way to do it safely is to open the file twice, once for reading, and separately, with IO::AtomicFile, another time for writing. IO::AtomicFile will replace the original file only if it’s successfully closed.

Aristotle Pagaltzis
Thank you very much for the very informative post!
Azlam
Excellent detailed answer.
David Precious
+2  A: 

As others have pointed out, if the end goal is only to remove lines starting with //#, for performance reasons you are probably better off using grep or sed:

grep -v '^\/\/#' filename.txt > filename.stripped.txt

sed '/^\/\/#/d' filename.txt > filename.stripped.txt

or

sed -i '/^\/\/#/d' filename.txt

if you prefer in-place editing.

Note that in perl your regex would be

m{^//#}

which matches two slashes followed by a # at the start of the string.

Note that you avoid "backslashitis" by using the match operator m{pattern} instead of the more familiar /pattern/. Train yourself on this syntax early since it's a simple way to avoid excessive escaping. You could write m{^//#} just as effectively as m%^//#% or m#^//\##, depending on what you want to match. Strive for clarity - regular expressions are hard enough to decipher without a prickly forest of avoidable backslashes killing readability. Seriously, m/^\/\/#/ looks like an alligator with a chipped tooth and a filling or a tiny ASCII painting of the Alps.

One problem that might come up in your script is if the entire file is slurped up into a string, newlines and all. To defend against that case, use the /m (multiline) modifier on the regex:

m{^//#}m

This allows ^ to match at the beginning of the string and after a newline. You would think there was a way to strip or match the lines matching m{^//#.*$} using the regex modifiers /g, /m, and /s in the case where you've slurped the file into a string but you don't want to make a copy of it (begging the question of why it was slurped into a string in the first place.) It should be possible, but it's late and I'm not seeing the answer. However, one 'simple' way of doing it is:

my $cooked = join qq{\n}, (grep { ! m{^//} } (split m{\n}, $raw));

even though that creates a copy instead of an in-place edit on the original string $raw.

arclight