My software allows users to use regexp to prepare files. I am in the process of adding a default regexp library with common expressions that can be re-used to prepare a variety of formats. One common task is to remove crlf in specific parts of the files, but not in others. For instance, this:
<TU>Lorem
Ipsum</TU>
<SOURCE>This is a sentence
that should not contain
any line break.
</SOURCE>
Should become:
<TU>Lorem
Ipsum</TU>
<SOURCE>This is a sentence that should not contain any line break.
</SOURCE>
I have a rexep that does the job pretty nicely:
(?(?<=<SOURCE>(?:(?!</?SOURCE>).)*)(\r\n))
The problem is that it is processing intensive and with files above 500kb, it can take 30+ seconds. (regex is compiled, in this case, uncompiled is much slower)
It's not a big issue, but I wonder is there is a better way to achieve the same results with Regex.
Thanks in advance for your suggestions.