views:

62

answers:

4

I want to delete all the text between a pair of "};" which contains a particular keyword. What i want is

input:

}; text text KEYWORD text text };

Output:

};   };

Suggest me a simple regular expression. I know 'sed' is to be used.

+2  A: 
\};[^}]*KEYWORD[^}]*\};

will work if there are no } between the two delimiters.

So:

sed 's/\};[^}]*KEYWORD[^}]*\};/}; };/g' file.in > file.out
Tim Pietzcker
but this would match the start and end markers as well right?
Gopi
Yes, and they are replaced right back. sed doesn't have lookaround (GNU BRE engine).
Tim Pietzcker
Just match the whole thing and then replace it with the literal "}; };"
colithium
getting some error. "RE error: parentheses not balanced"
sole007
Hm. Does `sed 's/\};[^\}]*KEYWORD[^\}]*\};/}; };/g' file.in > file.out` work?
Tim Pietzcker
nop! same error! also there can be multiple lines between a pair of };. I tried using: `sed "s/[}][;][^}]*KEYWORD[^}]*[}][;]/}; };/g" file.in > file.out` . not working
sole007
A: 

Below regex will match the thing that you want to delete -

(?<=\};).*?KEYWORD.*?(?=\};)

Edit: this wont work with sed as pointed out by @Tim as sed does not support lookarounds.

Gopi
This is not looking for the keyword, and won't work in sed (no lookaround).
Tim Pietzcker
Thanks @Tim for bringing it to my notice. Fixed. And yeah this is general regex, I am not sure about any specifics with sed.
Gopi
A: 

This should work under most conditions:

sed '/};[^}]*};/{s/};[^}]*};/}; };/;b};/};/!b;:a;N;/\n[^}]*};/!ba;s/[^;]*\n.*\n[^}]*/ /' inputfile

There will probably be some corner cases where this fails. Change the space near the end to \n if you want the result to be on two lines.

Examples:

}; test ;} becomes }; };

};
test
};
becomes }; };

abc };
test
}; def
becomes abc }; }; def

abc }; 111
test1
test2
222 }; def
becomes abc }; }; def

Dennis Williamson
A: 

The simplest approach possible:

cat file.in | sed "/KEYWORD/s/};[^}]*};/}; };/g" > file.out
mhitza