views:

51

answers:

1

I'm cleaning up some web pages that for some reason have about 8 line breaks between tags. I wanted to remove most of them, and I tried this

perl -pi -w -e "s/\n\n//g" *.html

But no luck. For good measure, I tried

perl -pi -w -e "s/\n//g" *.html

and it did remove all my line breaks. What am I doing wrong?

edit I also tried \r\n\r\n, same deal. Works as a single line breaks, doesn't do anything for two consecutive ones.

+6  A: 

Use -0:

perl -pi -0 -w -e "s/\n\n//g" *.html

The problem is that by default -p reads the file one line at a time. There's no such thing as a line with two newlines, so you didn't find any. The -0 changes the line-ending character to "\0", which probably doesn't exist in your file, so it processes the whole file at once. (Even if the file did contain NULs, you're looking for consecutive newlines, so processing it in NUL-delimited chunks won't be a problem.)

You probably want to adjust your regex as well, but it's hard to be sure exactly what you want. Try s/\n\n+/\n/g, which will replace any number of consecutive newlines with a single newline.

cjm