views:

139

answers:

3

I have a load balanced enviorment with over 10 web servers running IIS. All websites are accessing a single file storage that hosts all the pictures. We currently have 200GB of pictures - we store them in directories of 1000 images per directory. Right now all the images are in a single storage device (RAID 10) connected to a single server that serves as the file server. All web servers are connected to the file server on the same LAN. I am looking to improve the architecture so that we would have no single point of failure. I am considering two alternatives:

  1. Replicate the file storage to all of the webservers so that they all access the data locally
  2. replicate the file storage to another storage so if something happens to the current storage we would be able to switch to it.

Obviously the main operations done on the file storage are read, but there are also a lot of write operations. What do you think is the preferred method? Any other idea?

I am currently ruling out use of CDN as it will require an architecture change on the application which we cannot make right now.

A: 

Certain things i would normally consider before going for arch change is

  1. what are the issues of current arch
  2. what am i doing wrong with the current arch.(if this had been working for a while, minor tweaks will normally solve a lot of issues)
  3. will it allow me to grow easily (here there will always be a upper limit). Based on the past growth of data, you can effectively plan it.
  4. reliability
  5. easy to maintain / monitor / troubleshoot
  6. cost

200GB is not a lot of data, and you can go in for some home grown solution or use something like a NAS, which will allow you to expand later on. And have a hot swappable replica of it.

Replicating to storage of all the webservers is a very expensive setup, and as you said there are a lot of write operations, it will have a large overhead in replicating to all the servers(which will only increase with the number of servers and growing data). And there is also the issue of stale data being served by one of the other nodes. Apart from that troubleshooting replication issues will be a mess with 10 and growing nodes. Unless the lookup / read / write of files is very time critical, replicating to all the webservers is not a good idea. Users(of web) will hardly notice the difference of 100ms - 200ms in loadtime.

Sundarram P.V.
A: 

There are some enterprise solutions for this sort of thing. But I don't doubt that they are expensive. NAS doesn’t scale well. And you have a single point of failure which is not good.

There are some ways that you can write code to help with this. You could cache the images on the web servers the first time they are requested, this will reduce the load on the image server.

You could get a master slave set up, so that you have one main image server but other servers which copy from this. You could load balance these, and put some logic in your code so that if a slave doesn’t have a copy of an image, you check on the master. You could also assign these in priority order so that if the master is not available the first slave then becomes the master.

Jeremy French
A: 

Since you have so little data in your storage, it makes sense to buy several big HDs or use the free space on your web servers to keep copies. It will take down the strain on your backend storage system and when it fails, you can still deliver content for your users. Even better, if you need to scale (more downloads), you can simply add a new server and the stress on your backend won't change, much.

If I had to do this, I'd use rsync or unison to copy the image files in the exact same space on the web servers where they are on the storage device (this way, you can swap out the copy with a network file system mount any time).

Run rsync every now and then (for example after any upload or once in the night; you'll know better which sizes fits you best).

A more versatile solution would be to use a P2P protocol like Bittorreent. This way, you could publish all the changes on the storage backend to the web servers and they'd optimize the updates automatcially.

Aaron Digulla