ansaurus

Question

How can I detect dos line breaks in a file?

Answer 1

+2 A:

You could search the string for \r\n. That's DOS style line ending.

EDIT: Take a look at this

nc3b 2010-05-09 18:23:06

Yep, this is the way to go. There's no flag or anything.

Jonik 2010-05-09 18:25:03

Technically, you look for `"\r\x0A"`. Most compilers use line feed for `'\n'`, but it's not required to have that particular value.

Adrian McCarthy 2010-05-10 16:07:52

Answer 2

A:

dos linebreaks are \r\n, unix only \n. So just search for \r\n.

Femaref 2010-05-09 18:23:51

Answer 3

A:

As a complete Python newbie & just for fun, I tried to find some minimalistic way of checking this for one file. This seems to work:

if "\r\n" in open("/path/file.txt","rb").read():
    print "DOS line endings found"

Edit: simplified as per John Machin's comment (no need to use regular expressions).

Jonik 2010-05-09 19:04:52

Shouldn't you open the file with "rb"?

GregS 2010-05-09 19:17:26

Hmm, my first thought was no, because we're dealing with *text* files... But are you referring to this: "The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading." (http://docs.python.org/library/functions.html#open)? I wasn't aware of such conversions – maybe "rb" should indeed be used for this to work on non-Unix systems too.

Jonik 2010-05-09 20:15:13

`re.search()` is not minimalist; it's OVERKILL; use `"\r\n" in open(...).read()`. There's no "maybe" about using `"rb"`; it's a must.

John Machin 2010-05-09 22:20:23

Thanks @John for pointing that out. Much nicer now.

Jonik 2010-05-10 07:05:52

Answer 4

+1 A:

If you just want to read text files, either DOS or Unix-formatted, this works:

print open('myfile.txt', 'U').read()

That is, Python's "universal" file reader will automatically use all the different end of line markers, translating them to "\n".

http://docs.python.org/library/functions.html#open

shavenwarthog 2010-05-09 20:29:36

Well, I'll want to edit them in vim. I'd like to make that line ending change once and commit it, vs per file.

chiggsy 2010-05-09 22:20:41

This will destructively change DOS CRLF to Unix LF on all files in the current directory: perl -p0i -e 's/\r\n/\n/g' *I've typed this so many times my fingers have memorized it :)

shavenwarthog 2010-05-10 21:53:24

Answer 5

+1 A:

Python knows how to find the newline endings in a file, and you can access them through the newline attribute:

f = open('myfile.txt', 'U')
f.readline()  # Reads a line
# This is the newline ending of the first line.
# It can be "\r\n" (Windows), "\n" (Unix), "\r" (old Mac OS), or None (no newline termination found):
print repr(f.newlines)

This gives the newline ending of the first line (Unix, DOS, etc.), if any. As John M. pointed out, if by any chance you have a pathological file that uses more than one newline coding, f.newlines is a tuple with all the newline codings found so far, after reading many lines.

If you just want to convert all files, you can simply do:

text = open('myfile.txt', 'U').read()  # Automatic conversion of newlines to "\n"
open('myfile.txt', 'w').write(text)  # Writes newlines for your platform

EOL 2010-05-10 07:26:06

-1 It's called `newlines` (plural) and it's not an encoding. What you have shown is how to find what (if anything) terminates the first line (if any). Your comment is incorrect: it doesn't include the case where the first line and only line is not terminated (and so `newlines` refers to `None`). Further, it assumes that all lines are terminated the same way. Concatenations of files of different line endings are not unknown. In the OP's application of standardising on one line ending, he will need to read ALL the input file (and ALL the docs, especially where it mentions `tuple`).

John Machin 2010-05-10 12:18:14

@John: Come on: -1 for an answer that mentions the useful `newlines`, but only with a typo? Or for pathological files concatenated from files with different newline conventions? The original poster mentioned "files from Unix or DOS", not such strange files!

EOL 2010-05-10 15:51:39

@John: Your information about f.newlines returning a tuple in the case of a mixed newline convention is interesting. I added it to the response.

EOL 2010-05-10 15:58:02

I upvoted it. I was a useful answer to me. @John makes a very good point though, concerning corner cases.

chiggsy 2010-05-10 20:33:10

Thank you! I did cite John's corner case in the answer, because I also found it interesting. :)

EOL 2010-05-11 07:02:39

Answer 6

A:

Using grep & bash:

grep -c -m 1 $'\r$' file

echo $'\r\n\r\n' | grep -c $'\r$'     # test

echo $'\r\n\r\n' | grep -c -m 1 $'\r$'

shallo 2010-05-10 13:59:55

ansaurus

tags:

views:

answers:

How can I detect dos line breaks in a file?

related questions