distributed

Distributed Job scheduling, management, and reporting

I recently had a play around with Hadoop and was impressed with it's scheduling, management, and reporting of MapReduce jobs. It appears to make the distribution and execution of new jobs quite seamless, allowing the developer to concentrate on the implementation of their jobs. I am wondering if anything exists in the Java domain for th...

[Erlang] Spawn remote process w/o common file system

([email protected])8> spawn([email protected], tut, test, [hello, 5]). I want to spawn a process on bar.del.com which has no file system access to foo.hyd.com (from where I am spawning the process), running subroutine "test" of module "tut". Is there a way to do so, w/o providing the [email protected] with the compiled "tut" module fi...

What ways exist to distribute asynchronous batch tasks?

I am currently investigating what Java compatible solutions exist to address my requirements as follows: Timer based / Schedulable tasks to batch process Distributed, and by that providing the ability to scale horizontally Resilience, no SPFs please The nature of these tasks (heavy XML generation, and the delivery to web based recei...

What are the faster Paxos-related algorithms for consensus in distributed systems?

I've read Lamport's paper on Paxos. I've also heard that it isn't used much in practice, for reasons of performance. What algorithms are commonly used for consensus in distributed systems? ...

Is make -j distcc possible to scale over 5 times?

Since distcc cannot keep states and just possible to send jobs and headers and let those servers to use only the data just sent and preprocess and compile, I think the lastest distcc has problem in scalability. In my local build environment which has appx. 10,000 c/c++ files to build, I could only make 2 times faster than not using distc...

Scalability for large applications

Hi, There are very useful documents explaining the server architectures like Linkedin, MySpace, Amazon and etc. After seeing MySpace, I really surprised as they are using 500+ database servers for their application. Would like to know how will they maintain SQL transactions, joins, look ups if data spans across multiple database serve...

Web Services vs EJB vs RMI, advantages and disadvantages?

My web server would be overloaded quickly if all the work were done there. I'm going to stand up a second server behind it, to process data. What's the advantage of EJB over RMI, or vice versa? What about web services (SOAP, REST)? ...

Java framework for distributed system

I am looking for a library (or a combination of libraries) to build a java distributed system, made of several applications exchanging data through several pairwise connections (no mapreduce). For the moment I did an expolration of existing libraries and I could only discard what I'v found. Here are my requirements: Easy discovery of s...

Are there any general algorithms for achieving eventual consistency in distributed systems?

Are there any algorithms that are commonly used for achieving eventual consistency in distributed systems? There are algorithms that have been developed for ACID transactions in distributed systems, Paxos in particular, but is there a similar body of theory that has been developed for BASE scenarios, with weaker consistency guarantees...

Using JNDI for distributed configuration

We're looking at how to do distributed configuration within our primarily Java based deployment. We have a number of applications and it makes sense to centralise the configuration of the applications. JNDI appears to be the standard choice, probably backing off to something like ApacheDS (that way we can store non Java config in there a...

Distributed caching systems and how they distribute data

I'm looking for information on things like ehcache and other alternatives to memcached for a project that will likely involve 3-4 webservers and something like 2-10 million distributed objects that need to be available to all servers. Specifically, I'm trying to understand how other systems distribute data, whether or not memcached is u...

message passing and signaling in distributed system

I have a distributed video analysis system, which is composed of: 1. feature extraction: generated lots of features(20+) from each frame of the video 2. multiple detectors(in different machine): * Each of them will get a subset of feature * Each of them needs the features from multiple frames. * Eg. Detector 1 needs feature 1-5...

Approach for implementing synchronization?

The data will be kept on a server and on a client, say a desktop app. Clients will have a local "cache" and this is editable. Use cases: - The server side copy is edited, send to clients. - The client side copy is edited, send to server. I am thinking of giving the information an incremental value. Each time an edit takes place, it i...

Distributed Java Compiler

Hi, Is there a distributed compiler for java like distcc for C/C++? Thanks Bala ...

Why just file locking in multi-user systems is not sufficient?

Ritchie claims that file locking is not sufficient to prevent the confusion caused by programs such as editors that make a copy of a file while editing and then write the original file when done. Can you explain what he meant? ...

Scripting a distributed transaction on SQL

Hello Im using SQL Server with distributed transaction on local Oracles Linked servers, as the following : GO BEGIN DISTRIBUTED TRANSACTION; SET XACT_ABORT off; GO SELECT MAX(DEPTNO)+1, FROM [WSF08_CONTA_ORADATA_II]..[SCOTT].DEPT SET XACT_ABORT on; GO COMMIT TRANSACTION; So when I run this script I get the follows er...

Distributed Computing applications

Map reduce/Hadoop is one of the framework/program that s used for distributed systems. What are some other popular frameworks/programs? Thanks. ...

Glassfish Clustering architecture and Node Agents

I am going to be creating an application that will be highly distributed. There will be several "agents" each agent is a source of events, from external devices, ftp or filesystem. These agents will be deployed on seporate machines, close to the hardware source that will create the event. These agents will report back events to the ce...

Distributed computing with different random number generators

I have a computation/simulation system with one master server and (potentially) many clients (workers). All of them are working with the same data but need random numbers for the computation. What would be the best PRNG and the best way to seed it to make sure that two clients aren't using the same cycle and computing the same results t...

Any distributed parallel tree search algorithm suggestions?

Hello there, I'm writing a distributed Go/Gomoku bot. Basically the point is to distribute tree search onto many computers. With basic tree search algorithms like DFS this would be very simple, as I could just partition search space into subtrees. Though I'd rather have something more efficient, like mini-max with alpha-beta pruning - ...