ansaurus

Question

What's the most pythonic way of normalizing lineends in a string?

Answer 1

A:

'\n'.join( line.strip() for line in lines )

kurosch 2009-11-17 15:05:31

I don't think he has the `lines` array already, which is part of the challenge of this question.

Kai 2009-11-17 15:07:18

@Kai: the first line of his sample sets the lines array, he just doesn't know how to rejoin it with only LFs.

kurosch 2009-11-17 15:10:11

Answer 2

+4 A:

... but this doesn't handle "mixed" text-files of utterly confused conventions (Yes, they still exist!)

Actually it should work fine:

>>> s = 'hello world\nline 1\r\nline 2'

>>> s.splitlines()
['hello world', 'line 1', 'line 2']

>>> '\n'.join(s.splitlines())
'hello world\nline 1\nline 2'

What version of Python are you using?

EDIT: I still don't see how splitlines() is not working for you:

>>> s = '''\
... First line, with LF\n\
... Second line, with CR\r\
... Third line, with CRLF\r\n\
... Two blank lines with LFs\n\
... \n\
... \n\
... Two blank lines with CRs\r\
... \r\
... \r\
... Two blank lines with CRLFs\r\n\
... \r\n\
... \r\n\
... Three blank lines with a jumble of things:\r\n\
... \r\
... \r\n\
... \n\
... End without a newline.'''

>>> s.splitlines()
['First line, with LF', 'Second line, with CR', 'Third line, with CRLF', 'Two blank lines with LFs', '', '', 'Two blank lines with CRs', '', '', 'Two blank lines with CRLFs', '', '', 'Three blank lines with a jumble of things:', '', '', '', 'End without a newline.']

>>> print '\n'.join(s.splitlines())
First line, with LF
Second line, with CR
Third line, with CRLF
Two blank lines with LFs


Two blank lines with CRs


Two blank lines with CRLFs


Three blank lines with a jumble of things:



End without a newline.

As far as I know splitlines() doesn't split the list twice or anything.

Can you paste a sample of the kind of input that's giving you trouble?

Steve Losh 2009-11-17 15:14:46

I've edited my question. This (mixed convention) broke a test-case today :) ah, the assumptions we make... I'm hoping to avoid an O(n^2)-situation with a double split.

kaleissin 2009-11-17 15:23:46

+1 This looks perfect!@kaleissin Are you saying that splitlines() does not split at *all possible* line breaks (whatever the convention)? I'd be surprised…

EOL 2009-11-17 15:39:22

len("a\nb\n\nc\nd".splitlines()) == 5len("a\rb\r\nc\nd".splitlines()) == 4This is python 2.6.2. Basically, when it's mixed, I'm not making the assumption that '\r\n' or '\n\r' should be considered a single logical newline.

kaleissin 2009-11-18 10:07:31

Answer 3

A:

Are there even more convetions than \r\n\ and \n? Simply replacing \r\n is enough if you dont want lines.

only_newlines = mixed.replace('\r\n','\n')

THC4k 2009-11-17 15:20:08

DOS/Windows: `\015\012` (CRLF), Unix: `\012` (LF), Mac: `\015` (CR).

Sinan Ünür 2009-11-17 15:31:24

See also http://en.wikipedia.org/wiki/Newline#Representations

Sinan Ünür 2009-11-17 15:32:07

You can use os.linesep which will choose the correct newline character based on the OS the code is running on.

tomlog 2009-11-17 16:31:38

Answer 4

+4 A:

mixed.replace('\r\n','\n').replace('\r','\n')

should handle all possible variants.

dottedmag 2009-11-17 16:04:07

ansaurus

tags:

views:

answers:

What's the most pythonic way of normalizing lineends in a string?

Edit

Testcases

related questions