views:

458

answers:

4

Hello,

I am building a software program that follows the scenario:

I have many computers, and add each one of them to the a cluster. Each computer in the cluster can add a file to a distributed table (dictionary,hashmap, should be fast enough).

So now I have a place where everyone can see what files does the group/cluster contains. Now a computer from the group/cluster requests a file. He can access all the information about the file from the distributed table ( on what computer it can find the file etc...).

By some mechanism it should get the file from point B ( computer that has the file ) to point A ( the computer who requested the file ).

Basically it should do data replication. ( but for very large files )

So you probably wonder by now, what is this quy asking for, here it is :

The data replication should be as fast as possible. What would be the best approach ? I thought about something like flux networks.

What would be the best framework, to use for a software following the scenario ?

I AM SEARCHING FOR A JAVA FRAMEWORK :). (I NEEDED TO RUN CROSS-PLATFORM)

Thank you!

A: 

It looks like you are searching for: Project Voldemort or other Key=>Value datastores which automatic failover, replication etc.

Martin K.
I will take a better look at it. It is in my options.
Martin K.
A: 

JXTA is Sun's Java peer-to-peer framework, and most likely of use here.

Or check out Jini and it's capability for service leasing, dynamic discovery and protocol independent client/server communication. Using Jini you can publish each service with particular attributes (in this case your filename?), or perhaps use it with Javaspaces (I'm not sure about the appropriateness of spaces here, however)

Brian Agnew
Yes it is peer to peer, but I don't want to implement a distributed data structure.
Note that you have the problem of your data structure reporting machine B has a file, but what happens when machine B goes down ?
Brian Agnew
I have found a framework, It is called JGroups. Have heard about it ?
Yes. It's a reliable multicast mechanism. Nothing more than that, if memory serves
Brian Agnew
Yes, that is an issue. I am trying to find solutions for this kind of problems. :( Unfortunately the literature is vast an non-standard.
In order to replicate the file, wouldn't I need a multicasting system ?
+2  A: 

As I'm sure you have discovered there are a lot of libraries out there for java that allow you to implement this sort of disturbuted map.

  • Hazelcast - new kid on the block, really simple to use and provides implmentations of standard java interfaces like ConcurrentMap
  • JGroups - Really just library for group messaging, but includes a DisturbutedHashMap implementation
  • jBoss Cache - built on top of jGroups provides a much more complete disturbuted caching system with optional persistence and transactions
  • Terracotta - Big and quite popular, commercially supported
  • Oracle Coherence - The daddy of them all, with a price tag to match

There are more (quite a lot more), my personal preference at the moment is Hazelcast it's insanely easy to get started. All of the caching frameworks I've listed (I think) rely on being able (at least temporarily) to be able to load the whole entry into memory, this may be an issue if you are attempting to put the contents of large files into them.

In your case I'd problably use the disturbuted map to store the location data, ie some data to tell any other node where a particalar file is, and then go directly to that node using some out of bound method such as HTTP.

Gareth Davis
A: 

There are a few good answers above for distributed hashmap.

For actually copying the file if at all possible I would prefer to not copy anything and just have some shared storage solution. If you must use separate disks for each computer, then something simple such as setting up an FTP server on each computer should work. This doesn't have to be Java based, but if you want Java only solution something like Apache MINA could work.

Gregory Mostizky