Could someone provide an effective way to check if a file has CSV format using Python ?
Python has a csv module, so you could try parsing it under a variety of different dialects.
You need to think clearly on what you consider a CSV file to be.
For example, what sort of characters can occur between the commas. Is it text-only? Can it be Unicode characters as well? Should every line have the same number of commas?
There is no strict definition of a CSV file that I'm aware of. Usually it's ASCII text separated by commas and every line has the same number of commas and is terminated by your platform's line terminator.
Anyway, once you answer the questions above you'll be a bit farther on your way to knowing how to detect when a file is a CSV file.
You could try something like the following, but just because you get a dialect back from csv.Sniffer
really won't be sufficient for guaranteeing you have a valid CSV document.
csv_fileh = open(somefile, 'rb')
try:
dialect = csv.Sniffer().sniff(csv_fileh.read(1024))
# Perform various checks on the dialect (e.g., lineseparator,
# delimiter) to make sure it's sane
# Don't forget to reset the read position back to the start of
# the file before reading any entries.
csv_fileh.seek(0)
except csv.Error:
# File appears not to be in CSV format; move along