I'm trying to write a regular expression that replaces line feeds between certain areas of a text file, but only on plain text content (i.e. excludes text inside HTML attribute contents, like href) but not having much luck past the first part.
Example input:
AUTHOR: Me
DATE: Now
CONTENT:
This is an example. This is another example. <a href="http://www.stackoverflow/example-
link-that-breaks">This is an example.</a> This is an example. This is yet another
example.
END CONTENT
COMMENTS: 0
Example output:
AUTHOR: Me
DATE: Now
CONTENT:
This is an example. This is another example. <a href="http://www.stackoverflow/example-link-that-breaks">This is an example.</a> This is an example. This is yet another example.
END CONTENT
COMMENTS: 0
So ideally, a space replaces line breaks if they occur in plain text, but removes them without adding a space if they are inside HTML parameters (mostly href, and I'm fine if I have to limit it to that).