views:

1073

answers:

11

When displaying images on our website, we check if the file exists with a call to file_exists(). We fall back to a dummy image if the file was missing.

However, profiling has shown that this is the slowest part of generating our pages with file_exists() taking up to 1/2 ms per file. We are only testing 40 or so files, but this still pushes 20ms onto the page load time.

Can anyone suggest a way of making this go faster? Is there a better way of testing if the file is present? If I build a cache of some kind, how should I keep it in sync.

+4  A: 

Use absolute paths! Depending on your include_path setting PHP checks all(!) these dirs if you check relative file paths! You might unset include_path temporarily before checking the existence.

realpath() does the same but I don't know if it is faster.

But file access I/O is always slow. A hard disk access IS slower than calculating something in the processor, normally.

powtac
Good tip. I already provide a full path name to the file though (mostly to avoid the unreliable nature of include path settings).
rikh
A thread about this problem and a script to test: http://bytes.com/topic/php/answers/10394-file_exists-expensive-performance-terms
powtac
+1  A: 

file_exists() is automatically cached by PHP. I don't think you'll find a faster function in PHP to check the existence of a file.

See this thread.

mculp
+3  A: 

file_exists() should be a very inexpensive operation. Note too that file_exists builds its own cache to help with performance.

See: http://php.net/manual/en/function.file-exists.php

RC
I guess I should just accept that the performance is fine and leave it as is. I might go an break up the files into more folders though, as this will probably help things.
rikh
+1  A: 

Are they all in the same directory? If so it may be worth getting the list of files and storing them in a hash and comparing against that rather than all the file_exists lookups.

easement
I'm assuming this hash would be stored in APC somewhere... or some other sort of shared memory.
R. Bemrose
A: 

What about glob()? But I'm not sure if it's fast.

http://www.php.net/manual/en/function.glob.php

juno
glob() is a dinosaur compared to file_exists()! I don't think it will help in this case.
Pekka
A: 

I find 1/2ms per call very, very affordable. I don't think there are much faster alternatives around, as the file functions are very close to the lower layers that handle file operations.

You could however write a wrapper to file_exists() that caches results into a memcache or similar facility. That should reduce the time to next to nothing in everyday use.

Pekka
+3  A: 

We fall back to a dummy image if the file was missing

If you're just interested in falling back to this dummy image, you might want to consider letting the client negotiate with the server by means of a redirect (to the dummy image) on file-not-found.

That way you'll just have a little redirection overhead and a not-noticeable delay on the client side. At least you'll get rid of the "expensive" (which it isn't, I know) call to file_exists.

Just a thought.

jensgram
+1 for clever. Now I'm curious about what happens if you pass jpg data back with a 404 response. This is, after all, a 404-type behavior that OP is looking for.
timdev
Should be rendered OK. Basically it's the same behavior for custom 404-pages; ther're rendered as XHTML if served as such. Haven't tested, though.
jensgram
A: 

I'm not even sure if this will be any faster but it appears as though you would still like to benchmark soooo:

Build a cache of a large array of all image paths.

$array = array('/path/to/file.jpg' => true, '/path/to/file2.gif' => true);

Update the cache hourly or daily depending on your requirements. You would do this utilizing cron to run a PHP script which will recursively go through the files directory to generate the array of paths.

When you wish to check if a file exists, load your cached array and do a simply isset() check for a fast array index lookup:

if (isset($myCachedArray[$imgpath])) {
    // handle display
}

There will still be overhead from loading the cache but it will hopefully be small enough to stay in memory. If you have multiple images you are checking for on a page you will probably notice more significant gains as you can load the cache on page load.

cballou
+2  A: 

If you are only checking for existing files, use is_file(). file_exists() checks for a existing file OR directory, so maybe is_file() could be a little faster.

Alex
A: 

Create a hashing routine for sharding the files into multiple sub-directories.

filename.jpg -> 012345 -> /01/23/45.jpg

Also, you could use mod_rewrite to return your placeholder image for requests to your image directory that 404.

simplemotives
A: 

I don't exactly know what you want to do, but you could just let the client handle it.

ViperArrow