ansaurus

Question

Answer 1

+2 A:

x doesn't appear to do anything.

What if the file is less than 500 bytes?

Some binary files have a situation where you can have a header for the first N bytes of the file which contains some data that is useful for an application but that the library the binary is for doesn't care about. You could easily have 500+ bytes of ASCII in a preamble like this followed by binary data in the following gigabyte.

Should handle exception if the file can't be opened or read, etc.

Dave 2010-06-22 13:39:50

Answer 2

+1 A:

The first thing I noticed - unrelated to your actual question, but you should be closing your input stream in a finally block to ensure it's always done. Usually this merely handles exceptions, but in your case you won't even close the streams of files when returning false.

Asides from that, why the comparison to ISO control characters? That's not a "binary" file, that's a "file that contains 5 or more control characters". A better way to approach the situation in my opinion, would be to invert the check - write an isAsciiText function instead which asserts that all the characters in the file (or in the first 500 bytes if you so wish) are in a set of bytes that are known good.

Theoretically, only checking the first few hundred bytes of a file could get you into trouble if it was a composite file of sorts (e.g. text with embedded pictures), but in practice I suspect every such file will have binary header data at the start so you're probably OK.

Andrzej Doyle 2010-06-22 13:40:48

Answer 3

A:

You ignore what read() returns, what if the files is shorter than 500 bytes?
When you return false, you don't close the file.
When converting byte to char, you assume your file is 7-bit ASCII.

unbeli 2010-06-22 13:40:59

Answer 4

+1 A:

Since you call this class "isASCIIText", you know exactly what you're looking for. In other words, it's not "isTextInCurrentLocaleEncoding". Thus you can be more accurate with:

if (thisByte < 32 || thisByte > 127) bin++;

Pointy 2010-06-22 13:41:07

Answer 5

A:

This would not work with the jdk install packages for linux or solaris. they have a shell-script start and then a bi data blob.

why not check the mime type using some library like jMimeMagic (http://http://sourceforge.net/projects/jmimemagic/) and deside based on the mimetype how to handle the file.

Nikolaus Gradwohl 2010-06-22 13:46:27

Answer 6

A:

Fails badly if file size is less than 500 bytes
The line char it = (char) thisByte; is conceptually dubious, it mixes bytes and chars concepts, ie. assumes implicitly that the encoding is one-byte=one character (them, it excludes unicode encodings). In particular, it fails if the file is UTF-16 encoded.
The return inside the loop (slightly bad practice IMO) forgets to close the file.

leonbloy 2010-06-22 13:48:07

Answer 7

A:

Why do you need to know this?

If you're not sure what a file contains, process it as bytes, with InputStreams and OutputStreams. If you know by construction with 100% certainty that the file must only contain valid characters, process it as characters with Readers and Writers.

EJP 2010-06-23 02:34:23

ansaurus

tags:

views:

answers:

Java check whether file is binary

related questions