views:

314

answers:

3

How can you remove one line which contains the word SID in many files?

I wondering between sed and tr. However, neither of them seem to work for the purpose. I also would like to have some flexibility because of the following reason.

The problem is actually slightly more challenging. I need to remove also one line after the match in some files, while one line before in the other files. The one line has the character & which determines whether to remove one line below or above or only the match. It is likely that the easiest way is to make a list of different types of files and then remove the files in each list with different codes.

Examples of the data

Here &

    . "question_sent"
    . "&"                        // I do not want this line
    .  htmlspecialchars(SID)     // NOT wanted
    . "&"
    . "email="

No & here

    . "successful_registration&"
    . "SID="                    // Not wanted
    .  htmlspecialchars($SID)   // Not wanted
    . "&"                       // not wanted
    . "email="

The character & is now in HTML encoding that is &

  if(isset($_GET['ask_question'])) {
      echo  ("<li id='ask_question_active'><a href='?ask_question&amp;"
          .  htmlspecialchars(SID)   // not wanted
          . "&amp;"                 // not wanted
          . "email=
+2  A: 

This cannot be done with tr. Sed can probably be used but I do not know it well enough to give an example. What I would have used is perl, and then I would probably introduced some state variables, see this answer for an example of what I mean. Probably I would use states single_ampersand_found (do not print the line yet, if next line contains SID, forget it else print it) and SID_found (if next line contains & forget that line).


Update: the following code will suppress all the lines now marked with "not wanted" plus the fourth line in your first example (i.e a bug), but I figure it should be good enough for you to correct and adapt to your needs.

#!/usr/bin/perl -w
use strict;
use warnings;

my $state = 0;
my $state_ampersand_found = 1;
my $state_SID_found  = 2;

my $previous_line = "";

while (my $line = <>) {
        chomp($line);

        if ($line =~ /"&/) {
                if ($state == $state_ampersand_found) {
                        print $previous_line;
                }
                if ($state == $state_SID_found) {
                        $previous_line = "";
                        $state = 0;
                        next;
                }
                $state = $state_ampersand_found;
                # remember current line, but do not print it (yet)
                $previous_line = $line . "\n";
                next;
        }
        if ($line =~ /SID/) {
                $previous_line = "";
                $state = $state_SID_found;
                next;
        }
        $state = 0;
        print $previous_line;
        print $line, "\n";
}
hlovdal
+1: for the clear names of the variables
Masi
+1  A: 

Updated again: I think this fixes the bug in the previous script I posted.

#!/usr/bin/perl

use strict;
use warnings;

my $re_amp = qr/"&(?:amp;)?"/;
my $re_sid = qr/SID/;

while ( my $this = <DATA> ) {
    next unless $this =~ /\S/;

    if ( $this =~ $re_amp ) {
        $this = skip_while(\*DATA, $re_sid);
    }
    elsif ( $this =~ $re_sid ) {
        $this = skip_while(\*DATA, $re_sid, $re_amp);
    }

    print $this if defined $this;
}

sub skip_while {
    my ($fh, $re1, $re2) = @_;
    my $line;
    while ( $line = <$fh> ) {
        next if (defined $re1 and $line =~ $re1)
             or (defined $re2 and $line =~ $re2);
        last;
    }
    return $line;
}

__DATA__
handlers/handle_new_question.php-        . "question_sent"
handlers/handle_new_question.php-        . "&"                        // I do not want this line
handlers/handle_new_question.php:        .  htmlspecialchars(SID)   // NOT wanted
handlers/handle_new_question.php-        . "&"
handlers/handle_new_question.php-        . "email="

handlers/handle_registration.php-            . "successful_registration&"
handlers/handle_registration.php:            . "SID="                   // Not wanted
handlers/handle_registration.php:            .  htmlspecialchars($SID)   // Not wanted
handlers/handle_registration.php-            . "&"                  // not wanted
handlers/handle_registration.php-//            . "email="

views/ask_question_link.php-        if(isset($_GET['ask_question'])) {
views/ask_question_link.php-            echo  ("<li id='ask_question_active'><a href='?ask_question&amp;"
views/ask_question_link.php:                .  htmlspecialchars(SID)   // not wanted
views/ask_question_link.php-                . "&amp;"           // not wanted
views/ask_question_link.php-//                . "email=

Output:

C:\Temp> w
handlers/handle_new_question.php-        . "question_sent"
handlers/handle_new_question.php-        . "&"
handlers/handle_new_question.php-        . "email="
handlers/handle_registration.php-            . "successful_registration&"
handlers/handle_registration.php-//            . "email="
views/ask_question_link.php-        if(isset($_GET['ask_question'])) {
views/ask_question_link.php-            echo  ("<li id='ask_question_active'><a href='?ask_question&amp;"
views/ask_question_link.php-//                . "email=
Sinan Ünür
+3  A: 

I wouldn't feel game to run a global search-and-replace when the code is so inconsistent. I would be using grep/vim to check each line, unless you seriously do have 10,000 changes to make. To use grep/vim, the steps would be something like this:

1) Add the following to your .vimrc:

" <f1> looks for SID in the current file
map <f1> /\<SID\><CR>
" <f2> goes to the next file
map <f2> :next<CR><f1>

" <f5> deletes only the current line, and goes to the next SID
map <f5> dd
" <f6> deletes the current line and the one above, and goes to the next SID
map <f6> k2dd
" <f7> deletes the current line and the one below, and goes to the next SID
map <f7> 2dd
" <f8> deletes the current line, and the one above AND the one below
map <f8> k3dd

2) This grep command will find all the files you need to change:

grep -rl '\bSID\b' * > fix-these-files.txt

You may need to tweak it slightly to make sure that it is finding all the files you need to change. Make sure it is correct before you go to the next step.

3) Use vim to open all the files that need fixing, like this:

vim '+set confirm' '+/\<SID\>' $(cat fix-these-files.txt)

4) You should now have vim open, and looking at the first SID in the first file you need to change. Use the following steps to fix each occurrence of SID:

  • If you only need to delete the current line, press <F5>.
  • If you need to delete the line above at the same time, press <F6> instead of <F5>.
  • If you need to delete the line below at the same time, press <F7> instead of <F5>
  • If you need to delete the lines above and below at the same time, press <F8> instead of <F5>
  • Press <F1> to find an another occurrence of SID to fix.
  • When SID cannot be found in the current file any more, press <F2> to go to the next file.

Exit vim when there are no more SIDs to be fixed.

5) Check to make sure you got everything, by running the grep command from step (2) again. There should be no search matches.

6) Delete the extra mappings you added to your .vimrc in step (1).

Warning: I haven't tested the steps above, if you use them, be careful that you only make exactly the changes you need!

too much php
This is a great answer! - It suggests me that the efficient use of Vim can help you a lot. - *Do you know how to open all grep-matches to Vim -windows such that you can go directly to the match?* - In pseudo-code: `grep -l test *.php | vim`
Masi
'grep -l test *.php | vim -' (note the last dash at the end)
too much php