views:

1444

answers:

3

I'm reading a file line by line, like this:

 FileReader myFile = new FileReader(File file);
 BufferedReader InputFile = new BufferedReader(myFile);
 // Read the first line
 String currentRecord = InputFile.readLine();

 while(currentRecord != null) {
      currentRecord = InputFile.readLine();
 }

But if other types of files are uploaded, it will still read their contents. For instance, if the uploaded file is an image, it will output junk characters when reading the file. So my question is: how can I check the file is CSV for sure before reading it?

Checking extension of the file is kind of lame since someone can upload a file that is not CSV but has a .csv extension. Thanks in advance.

+2  A: 

Determining the MIME type of a file is not something easy to do, especially if ASCII sections can be mixed with binary ones.

Actually, when you look at how a java mail system does determine the MIME type of an email, it does involve reading all bytes in it, and applying some "rules".
Check out MimeUtility.java

  • If the primary type of this datasource is "text" and if all the bytes in its input stream are US-ASCII, then the encoding is "7bit".
  • If more than half of the bytes are non-US-ASCII, then the encoding is "base64".
  • If less than half of the bytes are non-US-ASCII, then the encoding is "quoted-printable".
  • If the primary type of this datasource is not "text", then if all the bytes of its input stream are US-ASCII, the encoding is "7bit".
  • If there is even one non-US-ASCII character, the encoding is "base64". @return "7bit", "quoted-printable" or "base64"

As mentioned by mmyers in a deleted comment, JavaMimeType is supposed to do the same thing, but:

  • it is dead since 2006
  • it does involve reading the all content!

:

File file = new File("/home/bibi/monfichieratester");
InputStream inputStream = new FileInputStream(file);
ByteArrayOutputStream byteArrayStream = new ByteArrayOutputStream();
int readByte;
while ((readByte = inputStream.read()) != -1) {
    byteArrayStream.write(readByte);
}
String mimetype = "";
byte[] bytes = byteArrayStream.toByteArray();

MagicMatch m = Magic.getMagicMatch(bytes);
mimetype = m.getMimeType();

So... since you are reading the all content of the file anyway, you could take advantage of that to determine the type based on that content and your own rules.

VonC
A: 

Java Mime Magic may be of use. It'll analyse mime-types from files and inputstreams. I can't vouch for it's functionality, however.

This link may provide further info. It provides several different means of determining how to do what you want (or at least something similar).

I would perhaps be tempted to write something specific to your problem domain. e.g. determining the number of comma-separated values per line and rejecting if it's not within certain limits. Then split on the commas and parse each entry according to requirements (e.g. are they doubles/floats/valid Strings - and if strings, what encoding). I think you may have to do this anyway, given that someone may upload a file that starts like a CSV but is corrupted half-way through.

Brian Agnew
A: 

Please take a look at http://dpi.wi.gov/lbstat/wsnspecexe.html

There is a java script validation, Rest you know what to do...

Nirav