See questions I asked in comment at top.
Assuming you're using GNU sed, and that you're trying to add the trailing /
to your tags to make XML-compliant <img />
and <input />
, then replace the sed expression in your command with this one, and it should do the trick: '1h;1!H;${;g;s/\(img\|input\)\( [^>]*[^/]\)>/\1\2\/>/g;p;}'
Here it is on a simple test file (SO's colorizer doing wacky things):
$ cat test.html
This is an <img tag> without closing slash.
Here is an <img tag /> with closing slash.
This is an <input tag > without closing slash.
And here one <input attrib="1"
> that spans multiple lines.
Finally one <input
attrib="1" /> with closing slash.
$ sed -n '1h;1!H;${;g;s/\(img\|input\)\( [^>]*[^/]\)>/\1\2\/>/g;p;}' test.html
This is an <img tag/> without closing slash.
Here is an <img tag /> with closing slash.
This is an <input tag /> without closing slash.
And here one <input attrib="1"
/> that spans multiple lines.
Finally one <input
attrib="1" /> with closing slash.
Here's GNU sed regex syntax and how the buffering works to do multiline search/replace.
Alternately you could use something like Tidy that's designed for sanitizing bad HTML -- that's what I'd do if I were doing anything more complicated than a couple of simple search/replaces. Tidy's options get complicated fast, so it's usually better to write a script in your scripting language of choice (Python, Perl) that calls libtidy
and sets whatever options you need.