tags:

views:

971

answers:

2

I have a rather large text file that has a bunch of missing newlines, meaning that it's a mess. I need to break it up into appropriate lines.

The text looks something like this now:

12345 This is a chunk 23456 This is another chunk 34567 This is yet another chunk 45678 This is yet more chunk 56789 Yet another piece of text

I need a regex that will insert a newline (CR/LF pair) before each group of five digits, resulting in something like this:

12345 This is a chunk 
23456 This is another chunk 
34567 This is yet another chunk 
45678 This is yet more chunk 
56789 Yet another piece of text

It can insert one before the first group of digits or not; that I can deal with.

Any ideas? Thanks.

+3  A: 

Very simple (but not as "flashy" as possible, since I'm too lazy to use lookaheads):

s/(\d{5})/\r\n\1/gs
You probably want \r\n since the OP wants CR/LF
cletus
@cletus: It might depend on the programming language but Perl and Python replace \n by \r\n on Windows.
J.F. Sebastian
Noted, and modified as requested.
You guys are awesome (and quick). Thanks!
Ken White
+1  A: 
s/(?<=\D)(\d{5})(?=\D|$)/\n\1/g

On "\n" vs. "\r\n"

It might depend on the programming language at hand but Perl and Python replace \n by \r\n on Windows therefore it is a mistake in this case to replace \n by \r\n in the above regex.

J.F. Sebastian
Thanks, J.F. Your solution works also (with the correction to \r\n mentioned in the comments to cmartin above).
Ken White
It totally depends on the regex engine you're using. In Perl/Python you're absolutely right. I was doing some 1-off cleanup of some files to be imported, and was using RegExBuddy; it's flavor of regex needed \r\n.Thanks again.
Ken White