views:

282

answers:

8

I'm looking for a php function that will sanitize a string and make it ready to use for a filename. Anyone know of a handy one?

( I could write one, but I'm worried that I'll overlook a character! )

Edit: for saving files on a Windows NTFS filesystem.

A: 

The following expression creates a nice, clean, and usable string:

/[^a-z0-9\._-]+/gi

Turning today's financial: billing into today-s-financial-billing

Jonathan Sampson
so a filename can't have a period or an underscore, or anything like that?
Tor Valamo
@Jonathan - what's with the italics?
Dominic Rodger
@Tor, yes, sorry. Updated. @Dominic, just drawing emphasis on the text.
Jonathan Sampson
What is gism? I get " Warning: preg_replace() [function.preg-replace]: Unknown modifier 'g' "
`g` - global, `i` - insensitive case, `s` - dotall, `m` - multiline. In this example, you could do without `s` and `m`.
Jonathan Sampson
+1  A: 
preg_replace("[^\w\s\d\.\-_~,;:\[\]\(\]]", '', $file)

Add/remove more valid characters depending on what is allowed for your system.

Alternatively you can try to create the file and then return an error if it's bad.

Tor Valamo
That would allow through filenames like `..`, which may or may not be a problem.
Dominic Rodger
@Dom - just check for that separately, since it's a fixed value.
Tor Valamo
+1  A: 

Instead of worrying about overlooking characters - how about using a whitelist of character you are happy to be used? For example, you could allow just good ol' a-z, 0-9, _, and a single instance of a period (.). That's obviously more limiting than most filesystems, but should keep you safe.

Dominic Rodger
No good for languages with Umlauts. This would result in Qubec for Québec, Dsseldorf for Düsseldorf, and so on.
Pekka
True - but like I said: "For example".
Dominic Rodger
Which may be perfectly acceptable to the OP. Otherwise, use something like http://php.net/manual/en/class.normalizer.php
Blair McMillan
Thanks for bringing that up, Pekka. I hadn't thought of that.
A: 

Well, tempnam() will do it for you.

http://us2.php.net/manual/en/function.tempnam.php

but that creates an entirely new name.

To sanitize an existing string just restrict what your users can enter and make it letters, numbers, period, hyphen and underscore then sanitize with a simple regex.

$sanitized = preg_replace('/[^a-zA-Z0-9-_\.]/','', $filename);
Mark Moline
A: 

/ and .. in the user provided file name can be harmful. So you should get rid of these by something like:

$fname = str_replace('/','',$fname);
$fname = str_replace('..','',$fname);
gameover
A: 

Making a small adjustment to Tor Valamo's solution to fix the problem noticed by Dominic Rodger, you could use:

preg_replace("[^\w\s\d\-_~,;:\[\]\(\]]|[\.]{2,}", '', $file)
Sean Vieira
A: 

You could always find out what characters in file names are not allowed by naming a file in explorer and pressing an invalid character it will pop up with a message like "A filename cannot contain any of the following characters: ..."

Chaim Chaikin
A: 

one way

$bad='/[\/:*?"<>|]/';
$string = 'fi?le*';

function sanitize($str,$pat)
{
    return preg_replace($pat,"",$str);

}
echo sanitize($string,$bad);
ghostdog74