ansaurus

Question

Which technology should be used for serving large number of static files?

Answer 1

A:

I believe that a dedicated application with everything feeding off a memcache db would be the best bet.

Josh K 2010-08-27 18:45:20

Why? Give us some reasons...

Codo 2010-08-28 17:37:59

I don't think memcache db would do wonders as the memcache hit ratio will be very less, once data starts to exceed the cache size (limited by available RAM), in fact, more will be the burden of updating cache to be in sync with persistent storage, and checking for each possible request for hit/miss. I am more inclined towards lighhttpd and structured file system as suggested by Codo, but I might be wrong. The advantage here is OS will keep a cache and it will remove file from cache if it is updated by external process. Need to test both approach.

Ashish Patil 2010-09-03 13:32:12

Answer 2

+3 A:

1bn files at 1KB each, that's about 1TB of data. Impressive. So it won't fit into memory unless you have very expensive hardware. It can even be a problem on disk if your file system wastes a lot of space for small files.

30 requests a second is far less impressive. It's certainly not the limiting factor for the network nor for any serious web server out there. It might be a little challenge for a slow harddisk.

So my advice is: Put the XML files on a hard disk and serve them with a plain vanilla web server of your choice. Then measure the throughput and optimize it, if you don't reach 50 files a second. But don't invest into anything unless you have shown it to be a limiting factor.

Possible optimizations are:

Find a better layout in the file system, i.e. distribute your files over enough directories so that you don't have too many files (more than 5,000) in a single directory.
Distribute the files over several harddisks so that they can access the files in parallel
Use faster harddisk
Use solid state disks (SSD). They are expensive, but can easily serve hundreds of files a second.

If a large number of the files are requested several times a day, then even a slow hard disk should be enough because your OS will have the files in the file cache. And with today's file cache size, a considerable amount of your daily deliveries will fit into the cache. Because at 30 requests a second, you serve 0.25% of all files a day, at most.

Regarding distributing your files over several directories, you can hide this with an Apache RewriteRule, e.g.:

RewriteRule ^/xml/(.)(.)(.)(.)(.*)\.xml /xml/$1/$2/$3/$4/$5.xml

Codo 2010-08-28 17:53:03

@Codo: I didn't translate "bn" to "billion."

Josh K 2010-08-28 18:15:20

Exactly what I was thinking - building an app on top is just going to make it slower. A very important question though is latency - there is **no** "web server's defalt caching" but the more latency in updates being published the more load could be serviced by caching off the webserver.

symcbean 2010-08-30 12:14:30

You could also use NginX, and then use the built-in regex parsing to put the files into subdirectories. At this quantity of files, splitting them up by multiple levels of subdirectories would be almost required on a stock file-system (for example ext3)

Alister Bulman 2010-08-30 17:58:55

I am presently testing Codo's implementation. As of now, there is no issue as very less data is present. But I think this should work. Along with RewriteRule, I have also implemented ErrorDocument 404 for requests where data is not present to send Canned reply. I am not sure how I can change Response Code from 404 to 200 in this case. Let me know if this is possible without involving PHP, just via Apache configuration.

Ashish Patil 2010-09-13 13:41:47

To change the error code from 404 to 200, you basically use two directives in your Apache configuration: "RewriteCond %{REQUEST_FILENAME} !-f" and "RewriteRule ^.*+ /dummy_reply.xml". The first one makes sure that the second one only applies if the request could not be resolved to an existing file. The second one accesses a canned reply.

Codo 2010-09-13 16:58:13

Thanks Codo, now the system is working fine.

Ashish Patil 2010-09-14 10:06:37

Answer 3

+1 A:

Another thing you could look at is Pomegranate, which seems very similar to what you are trying to do.

Josh K 2010-08-30 17:50:28

ansaurus

tags:

views:

answers:

Which technology should be used for serving large number of static files?

related questions