tags:

views:

140

answers:

5

I have a scenario wherein the user uploads a file to the system. The only file that the system understands in a CSV, but the user can upload any type of file eg: jpeg, doc, html. I need to throw an exception if the user uploads anything other than CSV file.

Can anybody let me know how can I find if the uploaded file is a CSV file or not?

A: 

I don't know if you can tell for 100% certain in any way, but I'd suggest that the first validations should be:

  1. Is the file extension .csv
  2. Count the number of commas in the file per line, there should normally be the same amount of commas on each line of the file for it to be a valid CSV file. (As Jkramer said, this only works if the files can't contain quoted commas).
ho1
Point 2 is not completely true. For one, a CSV may use different delimiters. For example, I see the semicolon (;) much more often used in CSV than the comma. And 2nd, CSV may contain delimiters in quoted values. For example: foo,bar,"baz,quux",... (the third command between the quotes would not be recognized as demiliter by a CSV parser).
jkramer
I agree that not all lines will contain the same number of commas, but by definition, a Comma Separated Value file uses commas as the delimiter, not semicolons.
gregcase
+6  A: 

If you're using some library CSV parser, all you would have to do is catch any errors it throws.

If the CSV parser you're using is remotely robust, it will throw some useful errors in the event that it doesn't understand the file format.

Jamie Wong
I think this is the best way, trying to read the file as a csv-file - if it fails it obviously didn't have csv-format.
Anders K.
A: 

I can think of several methods.

One way is to try to decode the file using UTF-8. (This is built into Java and is probably built into .NET too.) If the file decodes properly, then you at least know that it's a text file of some kind.

Once you know it's a text file, parse out the individual fields from each line and check that you get the number of fields that you expect. If the number of fields per line is inconsistent then you might just have a file that contains text but is not organized into lines and fields.

Otherwise you have a CSV. Then you can validate the fields.

Willis Blackburn
Just wondering: why was this answer unhelpful? I've parsed CSV files in my own programs and have done just what I described.
Willis Blackburn
Hi Willis, this was a very helpful suggestion. I am able to validate the CSV file based on your inputs.. Thanks for your input... :-)
Mithun
A: 

If it's a web application, you might want to check the content-type HTTP header the browser sends when uploading/posting a file through a form. If there's a bind for the language you're using, you might also try using libmagic, is pretty good at recognizing file types. For example, the UNIX tool file uses it.

http://sourceforge.net/projects/libmagic/

jkramer
+3  A: 

CSV files vary a lot, and they all could be called, legitimately, CSV files.

I guess your approach is not the best one, the correct approach would be to tell if the uploaded file is a text file the application can parse instead of it it's a CSV or not.

You would report errors whenever you can't parse the file, be it a JPG, MP3 or CSV in a format you cannot parse.

To do that, I would try to find a library to parse various CSV file formats, else you have a long road ahead writing code to parse many possible types of CSV files (or restricting the application's flexibility by supporting few CSV formats.)

One such library for Java is opencsv

Vinko Vrsalovic
+1 for the recommendation to use a library. Parsing CSV is something that sounds very easy at first, until you need to start handling quoted values and badly formed data.
gregcase