views:

48

answers:

1

I have a Scalr EC2 cluster, and want an easy way to synchronize files across all instances.

For example, I have a bunch of files in /var/www on one instance, I want to be able to identify all of the other hosts, and then rsync to each of those hosts to update their files.

ls /etc/aws/hosts/app/

returns the IP addresses of all of the other instances

10.1.2.3 10.1.33.2 10.166.23.1

Ideas?

A: 

As Zach said you could use S3.

  • You could download one of many clients out there for mapping drives to S3. (search for S3 and webdav).
  • If I was going to go this route I would setup an S3 bucket with all my shared files and use jetS3 in a cronJob to sync each node's local drive to the bucket (pulling down S3 bucket updates). Then since I normally use eclipse & ant for building, I would create a ANT job for deploying updates to the S3 bucket (pushing updates up to the S3 bucket).

From http://jets3t.s3.amazonaws.com/applications/synchronize.html

Usage: Synchronize [options] UP <S3Path> <File/Directory>

(...) or: Synchronize [options] DOWN

UP      : Synchronize the contents of the Local Directory with S3.
DOWN    : Synchronize the contents of S3 with the Local Directory
...

I would recommend the above solution, if you don't need cross-node file locking. It's easy and every system can just pull data from a central location.

If you need more cross-node locking:

An ideal solution would be to use IBM's GPFS, but IBM doesn't just give it away (at least not yet). Even though it's designed for High Performance interconnects it also has the ability to be used over slower connections. We used it as a replacement for NFS and it was amazingly fast ( about 3 times faster than NFS ). There maybe something similar that is open source, but I don't know.

Have you evaluated using NFS? Maybe you could dedicate one instance as an NFS host.

eSniff