views:

552

answers:

8

I'm have build an up php script to host large number of images upload by user, what is the best way to generate random numbers to image filenames so that in future there would be no filename conflict? Be it like Imageshack. Thanks.

+5  A: 

Easiest way would be a new GUID for each file.

http://www.php.net/manual/en/function.uniqid.php#65879

Ed Guiness
are these truly unique for a million of images?
proyb2
http://en.wikipedia.org/wiki/Universally_Unique_Identifier#Random_UUID_probability_of_duplicates. You have a better chance of being hit by a meteorite.
Andrew Gwozdziewycz
still not a reason to avoid a system that is *really* unique.
Lo'oris
@Lo'oris: not getting hit by a meteorite sounds like a good reason.
Andrey Fedorov
A: 

Using something based on a timestamp maybe. See the microtime function for details. Alternatively uniqid to generate a unique ID based on the current time.

richsage
Basing just on a timestamp will not give you unique results.
devoured elysium
@devoured, you have a point :-)
richsage
A: 

Keep a persistent list of all the previous numbers you've generated(in a database table or in a file) and check that a newly generated number is not amongst the ones on the list. If you find this to be prohibitively expensive, generate random numbers on a sufficient number of bits to guarantee a very low probability of collision.

You can also use an incremental approach of assigning these numbers, like a concatenation of a timestamp_part based on the current time and a random_part, just to make sure you don't get collisions if multiple users upload files at the same time.

luvieere
This solution does not scale well
symcbean
+1  A: 

You could use microtime() as suggested above and then appending an hash of the original filename to further avoid collisions in the (rare) case of exact contemporary uploads.

Davide Gualano
I like this best. A hash created from the current timestamp, the current user name, the file name, and an incremental global upload counter would be 100% duplicate safe.
Pekka
You can add more to the mix, like info from the client e.g. ip address etc...
zaf
Why? Why not just use an incremental global counter? That's the only part of all of this that's *ensuring* 0 collisions.
Jeriko
or, as I suggested, just create a name without putting too much effort in it and **just check it** before committing.
Lo'oris
+4  A: 
$better_token = uniqid(md5(mt_rand()), true);
Osman Üngür
A: 
  1. forge a filename
  2. try to open that file
  3. if it exists, goto 1
  4. create the file
Lo'oris
+1  A: 

There are several flaws in your postulate that random values will be unique - regardless of how good the random number generator is. Also, the better the random number generator, the longer it takes to calculate results.

Wouldn't it be better to use a hash of the datafile - that way you get the added benefit of detecting duplicate submissions.

If detecting duplicates is known to be a non-issue, then I'd still recommend this approach but modify the output based on detected collisions (but using a MUCH cheaper computation method than that proposed by Lo'oris) e.g.

 $candidate_name=generate_hash_of_file($input_file);
 $offset=0;
 while ((file_exists($candidate_name . strrev($offset) && ($offset<50)) {
    $offset++;
 }
 if ($offset<50) {
    rename($input_file, $candidate_name . strrev($offset));
 } else {
    print "Congratulations - you've got the biggest storage network in the world by far!";
 }

this would give you the capacity to store approx 25*2^63 files using a sha1 hash.

As to how to generate the hash, reading the entire file into PHP might be slow (particularly if you try to read it all into a single string to hash it). Most Linux/Posix/Unix systems come with tools like 'md5sum' which will generate a hash from a stream very efficiently.

C.

symcbean
A: 

My solution is usually a hash (MD5/SHA1/...) of the image contents. This has the added advantage that if people upload the same image twice you still only have one image on the hard disk, saving some space (ofc you have to make sure that the image is not deleted if one user deletes it and another user has the same image in use).

dbemerlin
hashes, by definition, are not unique. (though of course collision is unlikely)
Lo'oris