views:

2470

answers:

6

So many options and so little time to test them all... I wonder if someone has experiences with distributed file systems for video streaming and storage/encoding.

I have a lot of huge video files (50GB to 250GB) that I need to store somewhere, be able to encode them to mp4 and stream them from several Adobe FMS servers. The only way to handle all this is with a distributed file system but now the question is which one??

My research so far tells me:

  • Lustre: mature proven solution, used by a lot of big companies, best with >10G files is a kernel driver.
  • Gluster: new, less mature, FUSE based that means easy to install but maybe slower due to FUSE overhead. Better to handle a large number of smaller files ~1GB
  • MogileFS: seems to be only for small files ~MB, uses HTTP for access?? possible FUSE binding in the future.

So far Lustre seems the winner but I would like to hear real experiences for the particular application I have.

Also Hadoop, Redhat GFS, Coda and Windows DFS sound as options so any experiences are welcome. If someone has benchmarks please share.

A: 

this really isn't a programming question and doesn't really belong here. But... I would recommend some form of file synchronization software like, unison, ifolder, or rsync. as the files aren't so huge they could sit on all the servers. All the clustering file systems aren't quiet there in my humble opinion.

mog
+2  A: 

You will probably get more of a response on the sysadmin version of stackoverflow ( serverfault.com )

Yannick M.
A: 

From the named systems the most suitable is MoglieFS.

But perhaps you can get by w/out any special system at all. Say you have 4 AdobeFMS servers:

{video0.exmple.com,video1.exmple.com,video2.exmple.com,video3.exmple.com}.

You can distribute all your videos among those 4 servers using simple scheme, like

    /*
     *  pseudo code
     */

    $server_id = get_server_id(filename);
    ...
    ...
    int function get_server_id(filename) 
    { 
       return hash(filename) mod 4;
    }

after you encode videos, your app would

$server_id = get_server_id(file_name)
copy file_name to /mnt/$server_id/

clients will access videos using something like http://videoN.example.com/filename.mp4, where N is calculated from filename using get_server_id() .

Luster/Gluster is really not what you should be looking for. Luster FS is more mature, but developers ask you to treat files on such FS as "cache", i.e. they can be lost at any time.

Luster/Gluster are targeted for use in HPC to allow fast access for huge amounts of data w/out single storage server being performance bottle-neck. Another point for those systems is that they are POSIX-complaint. In HPC/Scientific research environment you usually do not have a time to waist for rewriting your apps because you installed new cool and fast FS.

Konstantin Antselovich
That works great until one of the servers crashes. Oops.
Ask Bjørn Hansen
the "Oops" on server crash will happen with in all cases, unless special case is taken (even in case of MoglieFS) Also what "Oops" means will be different, in the setup mentioned above 1 server (out of 4) failure means that roughly 1/4 of read/write requests will fail until the failed server restored from backup.if that's not acceptable, then it is relatively easy setup [read-only] replica servers using, for instance, rsync
Konstantin Antselovich
A: 

Check out Hadoop Filesystem (HDFS). Its focus is on very large files and parallel task computing (with map/reduce), it has a high latency but very high throughput. It is currently used on such large installations as Facebook and amazon.com

Gabriel Filion
A: 

MogileFS is great for that sort of thing. The client libraries varies a bit in quality, but I'd be surprised if there weren't large-ish scale production sites using just about any language to access it.

HTTP is a good protocol for this stuff actually. Who doesn't have a feature-rich and efficient HTTP client?

Ask Bjørn Hansen
A: 

Map-reduce doesn't help in write/read ratio of 90/10! The constant file size is a good thing and the files are small. So, MogileFS sounds to be good alternative as Luster/Gluster - cache situation is not appropriate.

Rajan