views:

709

answers:

2

File1:

<a>hello</b> <c>foo</d>
<a>world</b> <c>bar</d>

Is an example of the file this would work on. How can one remove all strings which have a <c>*</d> using sed?

A: 

The following line will remove all text from <c> to </d> inclusive:

sed -e 's/<c>.*<\/d>//'

The bit inside the s/...// is a regular expression, not really a wildcard in the same way as the shell uses, so anything you can put in a regular expression you can put in there.

Adam Batkin
Works perfectly! Remember to users of this command to add input/output file at end to redirect sed: sed -e 's/<c>.*<\/d>//' In > Out.
A: 

if all your data is like that of the example

# gawk 'BEGIN{FS=" <c>"}{print $1}' file
<a>hello</b>
<a>world</b>
ghostdog74