views:

40

answers:

3

Is there an easy way to see whether particular file has DOS/MAC/UNIX line endings?

Currently i read the file byte by byte and stop if i see Windows carriage return

for (byte thisByte : bytes) {

  if ((!isDos) && (thisByte == 13)) {
      isDos = true;
  }
...

Is there a way to get same information without reading file byte by byte?

Thank you.

A: 

If you know that a file only uses one sort of end-of-line, then you can just scan for the first newline and see if it's DOS/UNIX/Mac.

Ben S
right. that's what i am currently doing, after heaving read file line by line. I wonder if there is an easier, more elegant way to get the same information
mac
From your code it looks like you read the whole file if it isn't DOS.
Ben S
+1  A: 

Assuming that it's a text file, and the lines are "reasonable" length, you could read a large block of the file (say 4096 bytes) and scan just that block for the CR character.

But otherwise, no, the only way that you can find a character in a file is to actually read the entire file and look for the character.

On the assumption that you're asking this question because you have performance problems reading the file a byte at a time: make sure that you wrap the FileInputStream with a BufferedInputStream.

Anon
@Anon. Thank you. Can you provide an example of reading a portion of a file? Or, do you mean "stop reading after having read 4096 bytes?)
mac
+3  A: 

A possible optimization might be to look only at the very final one or two bytes of the file. Since many text files terminate in a line ending this should work most of the time. If you don't spot a line ending there, then you'll have to fall back to byte-by-byte.

BTW, your example code sets isDos to true without checking if the very next character is a decimal 10. If it isn't a 10 then it's probably a MAC file format.

Amardeep
That is valid. In my small world only DOS or UNIX formats may make an appearance. I will make a change to make code more generic and include MAC. Thank you for your comment.
mac