views:

174

answers:

1

Right now I have a function which takes my uploaded file, checks the extension, and if it matches an array of valid extensions it's processed. It's a contact list importer.

What I need to figure out is how to be sure that file (in this case a .csv) is actually what it says it is (ex. not an excel file that just got renamed as a .csv).

Our servers run PHP 5.2.13

Here's the current validation function I have

public static function validateExtension($file_name,$ext_array) {
    $extension = strtolower(strrchr($file_name,"."));
    $valid_extension="FALSE";

    if (!$file_name) {
        return false;
    } else {
        if (!$ext_array) {
            return true;
        } else {
            foreach ($ext_array as $value) {
                $first_char = substr($value,0,1);

                if ($first_char <> ".") {
                    $extensions[] = ".".strtolower($value);
                } 
                else {
                    $extensions[] = strtolower($value);
                }
            }

            foreach ($extensions as $value) {
                if ($value == $extension) {
                    $valid_extension = "TRUE";
                }
            }

            if ($valid_extension==="TRUE") {
                return true;
            } else {
                return false;
            }
        }
    }

}

EDIT: I'm now trying to do

exec('file -ir '.$myFile) 

When I run this command in terminal I'm given a usable response. When I run the same command through php, I'm given something different. Any ideas why? I've tried it with exec, passthru, shell_exec. And the server does not have safe mode running.

+2  A: 

Forget extension checking, it's not reliable enough.

Also, I think traditional MIME magic sniffing will fail here, because there is no usable header (This is just my guess, though.)

In this specific case, I'd say it's feasible to take a quick peek at the contents, for example read the first ten lines or so. If they are all no longer than x bytes, and each line contains the same number of semicolons (or whatever your CSV parser takes as separators), it's a CSV file.

Pekka