views:

177

answers:

5

I have a "dilema" and wonder what is business best practice.

I'm using Uploadify to upload images. Now I need to validate the filename before saving the file.

I've looked at different solutions, but can't get down to one good solution.

Here are my criterias:

  • Filename must be all in lowercase
  • Filename can only contain charaters [a-z0-9_-]
  • I must be able to rename file

How would you go about if a filename is my.file(name).jpeg ?

I could explode the filename on '.' and save the extension, then implode to get the filename again. But not sure if that's the best soltion.

I have the following functions that helps a bit:

function getExts($filename) 
{ 
    $exts = explode("[/\\.]", $filename) ; 
    $n = count($exts)-1; 
    $exts = $exts[$n]; 
    return $exts; 
}

function validFilename($filename)
{
    $filename = str_replace(" ", "_", $filename);
    $pattern = "/[^[a-z0-9_-]/";
    return preg_replace($pattern, "", strtolower($filename));
}

UPDATE 1
I'm recieving the file through $_FILES. This gives me the following data:

  • $_FILES["file"]["name"] - the name of the uploaded file
  • $_FILES["file"]["type"] - the type of the uploaded file
  • $_FILES["file"]["size"] - the size in bytes of the uploaded file
  • $_FILES["file"]["tmp_name"] - the name of the temporary copy of the file stored on the server
  • $_FILES["file"]["error"] - the error code resulting from the file upload

UPDATE 2
I just found something. I could use getimagesize which will return an array of 7 elements. One of these elements [2] is IMAGETYPE_XXX.

So I try using this code:

function getExts2($filename)
{
    list(,,$type) = getimagesize($filename);
    return $type;
}

But it only returns the number 2...

(I also tried using exif_imagetype, but it only get PHP Error: Call to undefined function.)

A: 

I am assuming you are validating file names for later use as download file names.

While it would be great to preserve file names as they are for the sake of the user, from a technical viewpoint, your approach sounds sound.

If you care about what the URLs look like,and may target an international audience in the future, be sure to either convert Umlauts to their base characters or to convert them. A german user (Umlauts Ä Ö Ü ß) would expect conversion:

Ä = ae
Ö = oe
Ü = ue

while the scandinavians seem to be at ease with

Ä => a
Ö => o
Ø => o

and so on.

Then there are the various accented characters é á ó...

Dropping those altogether leads to URLs that look really bd and lok strng in the bwsr address br.

Pekka
+4  A: 

Check filename with regexp. Use info about mimetype. Save file on server with md5 name. Store real filename on db.

hsz
Saving the file using md5 is good. And yes, I'm already saving the filename in DB. But I'm louse using regexp.
Steven
Ok, but for what you want to check filename with regexp ? Mimetype way is not good ?
hsz
+1 for 'encrypting' the filename with md5
Hippo
@Hippo: I don't think it's because of security, but to avoid duplicate names. One can e.g. use this "formula": $randName = md5(rand() * time());
Steven
@hsz: Yes, I will use Mimetype / binary info.
Steven
+2  A: 

pathinfo() can get you the filename and extension. I'll warn you that you can't rely on testing a file's extension through inspection of its filename, however. You will want to use a function that actually inspects the binary contents of a file for this. finfo_file() can accomplish this. Oh, and it never hurts to use basename() on a user-supplied file path to prevent a path traversal attack.

jkndrkn
It's rather difficult using basename() if you don't know the extension. And, ans you say, using pathinfo isn't the best way. The safest would be reading the acutal binaries.
Steven
A: 

This should do everything you want:

$filename = preg_replace('/[^[a-z0-9_-]/', '', str_replace(' ', '_', strtolower(pathinfo($filename, PATHINFO_FILENAME)))) . pathinfo($filename, PATHINFO_EXTENSION);

And as Pekka explained it, if you want to substitute all accentuation in the file name you can use the following function:

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}
Alix Axel
The problem using [a-z0-9_-], is that is also removes the dot before the extension(?)
Steven
@Steven: Have you at least tried it? The trick is in using the pathinfo($filename, PATHINFO_FILENAME) and pathinfo($filename, PATHINFO_EXTENSION), it works just like you want it to.
Alix Axel
A: 

I don't think you should "validate" the filename, you should just "fix" it if it isn't in the format you want. Why reject a file just because someone sucks at naming their files, or has some unusual characters in there? Also, if you're working with images, you can use getimagesize as you mentioned to make sure it actually is an image that was uploaded (should fail if it isn't an image).

Mark
By validate, I mean checking the filename. No files will be rejected, but I will remove unwanted characters.
Steven