views:

444

answers:

6

I have a bunch of files. Some are unix line endings, many are dos. I'd like to test each file to see if if is dos formatted, before I switch the line endings.

How would I do this? Is there a flag I can test for? Something similar?

+2  A: 

You could search the string for \r\n. That's DOS style line ending.

EDIT: Take a look at this

nc3b
Yep, this is the way to go. There's no flag or anything.
Jonik
Technically, you look for `"\r\x0A"`. Most compilers use line feed for `'\n'`, but it's not required to have that particular value.
Adrian McCarthy
A: 

dos linebreaks are \r\n, unix only \n. So just search for \r\n.

Femaref
A: 

As a complete Python newbie & just for fun, I tried to find some minimalistic way of checking this for one file. This seems to work:

if "\r\n" in open("/path/file.txt","rb").read():
    print "DOS line endings found"

Edit: simplified as per John Machin's comment (no need to use regular expressions).

Jonik
Shouldn't you open the file with "rb"?
GregS
Hmm, my first thought was no, because we're dealing with *text* files... But are you referring to this: "The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading." (http://docs.python.org/library/functions.html#open)? I wasn't aware of such conversions – maybe "rb" should indeed be used for this to work on non-Unix systems too.
Jonik
`re.search()` is not minimalist; it's OVERKILL; use `"\r\n" in open(...).read()`. There's no "maybe" about using `"rb"`; it's a must.
John Machin
Thanks @John for pointing that out. Much nicer now.
Jonik
+1  A: 

If you just want to read text files, either DOS or Unix-formatted, this works:

print open('myfile.txt', 'U').read()

That is, Python's "universal" file reader will automatically use all the different end of line markers, translating them to "\n".

http://docs.python.org/library/functions.html#open

shavenwarthog
Well, I'll want to edit them in vim. I'd like to make that line ending change once and commit it, vs per file.
chiggsy
This will destructively change DOS CRLF to Unix LF on all files in the current directory: perl -p0i -e 's/\r\n/\n/g' *I've typed this so many times my fingers have memorized it :)
shavenwarthog
+1  A: 

Python knows how to find the newline endings in a file, and you can access them through the newline attribute:

f = open('myfile.txt', 'U')
f.readline()  # Reads a line
# This is the newline ending of the first line.
# It can be "\r\n" (Windows), "\n" (Unix), "\r" (old Mac OS), or None (no newline termination found):
print repr(f.newlines)

This gives the newline ending of the first line (Unix, DOS, etc.), if any. As John M. pointed out, if by any chance you have a pathological file that uses more than one newline coding, f.newlines is a tuple with all the newline codings found so far, after reading many lines.

If you just want to convert all files, you can simply do:

text = open('myfile.txt', 'U').read()  # Automatic conversion of newlines to "\n"
open('myfile.txt', 'w').write(text)  # Writes newlines for your platform
EOL
-1 It's called `newlines` (plural) and it's not an encoding. What you have shown is how to find what (if anything) terminates the first line (if any). Your comment is incorrect: it doesn't include the case where the first line and only line is not terminated (and so `newlines` refers to `None`). Further, it assumes that all lines are terminated the same way. Concatenations of files of different line endings are not unknown. In the OP's application of standardising on one line ending, he will need to read ALL the input file (and ALL the docs, especially where it mentions `tuple`).
John Machin
@John: Come on: -1 for an answer that mentions the useful `newlines`, but only with a typo? Or for pathological files concatenated from files with different newline conventions? The original poster mentioned "files from Unix or DOS", not such strange files!
EOL
@John: Your information about f.newlines returning a tuple in the case of a mixed newline convention is interesting. I added it to the response.
EOL
I upvoted it. I was a useful answer to me. @John makes a very good point though, concerning corner cases.
chiggsy
Thank you! I did cite John's corner case in the answer, because I also found it interesting. :)
EOL
A: 

Using grep & bash:

grep -c -m 1 $'\r$' file

echo $'\r\n\r\n' | grep -c $'\r$'     # test

echo $'\r\n\r\n' | grep -c -m 1 $'\r$'  
shallo