views:

341

answers:

12

Hey.

How can we do Parallel Programming in Java? Is there any special framework for that? How can we make the stuff work?

I will tell you guys what I need, think that I am developed a web crawler, its crawl lot of website from the internet, one system crawling will not make things to work proper, so we need 1000 of system to work,if this is the case can I apply parallel computing?Can you guys gave me a proper example?

+1  A: 

In java parallel processing is done using threads which are part of the runtime library

The Concurrency Tutorial should answer a lot of questions on this topic if you're new to java and parallel programming.

stacker
@stacker :Thks for info
Alex Mathew
+2  A: 

I have heard about one at conference a few years ago - ParJava. But I'm not sure about the current status of the project.

grigy
@grigy :thks for info
Alex Mathew
I see you are new to StackOverflow and thought you may not know that people usually upvote for useful answers.
grigy
+5  A: 

Java supports threads, thus you can have multi threaded Java application. I strongly recommend the Concurrent Programming in Java: Design Principles and Patterns book for that:

http://java.sun.com/docs/books/cp/

Manuel Selva
@Manuel Selva : thks for the info
Alex Mathew
+1  A: 

As far as I know, on most operating systems the Threading mechanism of Java should be based on real kernel threads. This is good from the parallel programming prospective. Other languages like Python simply do some time multiplexing of the processor (namely, if you run a heavvy multithreaded application on a multiprocessor machine you'll see only one processor running).

You can easily find something just googling it: by example this is the first result for "java threading": http://download-llnw.oracle.com/javase/tutorial/essential/concurrency/

Basically it boils down to extend the Thread class, overload the "run" method with the code belonging to the other thread and call the "start" method on an instance of the class you extended.

Also if you need to make something thread safe, have a look to the synchronized methods.

Dacav
@Decav : thks for the info ;)
Alex Mathew
+3  A: 

If you are asking about pure parallel programming i.e. not concurrent programming then you should definitely try MPJExpress http://mpj-express.org/. It is a thread-safe implementation of mpiJava and it supports both distributed and shared memory models. I have tried it and found very reliable.

1 import mpi.*;  
2  
3 
/**  
4  * Compile:impl specific.  
5  * Execute:impl specific.  
6  */  
7  
8 public class Send {  
9 
10     public static void main(String[] args) throws Exception { 
11 
12         MPI.Init(args); 
13 
14         int rank = MPI.COMM_WORLD.Rank() ; //The current process.
15         int size = MPI.COMM_WORLD.Size() ; //Total number of processes
16         int peer ; 
17 
18         int buffer [] = new int[10]; 
19         int len = 1 ;
20         int dataToBeSent = 99 ; 
21         int tag = 100 ; 
22 
23         if(rank == 0) { 
24 
25             buffer[0] = dataToBeSent ; 
26             peer = 1 ; 
27             MPI.COMM_WORLD.Send(buffer, 0, len, MPI.INT, peer, tag) ; 
28             System.out.println("process <"+rank+"> sent a msg to "+ 29                                "process <"+peer+">") ; 
30 
31         } else if(rank == 1) { 
32 
33             peer = 0 ; 
34             Status status = MPI.COMM_WORLD.Recv(buffer, 0, buffer.length, 35                                                 MPI.INT, peer, tag); 
36             System.out.println("process <"+rank+"> recv'ed a msg\n"+ 37                                "\tdata   <"+buffer[0]    +"> \n"+ 38                                "\tsource <"+status.source+"> \n"+ 39                                "\ttag    <"+status.tag   +"> \n"+ 40                                "\tcount  <"+status.count +">") ; 
41 
42         } 
43 
44         MPI.Finalize(); 
45 
46     }  
47 
48 }

One of the most common functionalities provided by messaging libraries like MPJ Express is the support of point-to-point communication between executing processes. In this context, two processes belonging to the same communicator (for instance the MPI.COMM_WORLD communicator) may communicate with each other by sending and receiving messages. A variant of the Send() method is used to send the message from the sender process. On the other hand, the sent message is received by the receiver process by using a variant of the Recv() method. Both sender and receiver specify a tag that is used to find a matching incoming messages at the receiver side.

After initializing the MPJ Express library using the MPI.Init(args) method on line 12, the program obtains its rank and the size of the MPI.COMM_WORLD communicator. Both processes initialize an integer array of length 10 called buffer on line 18. The sender process—rank 0—stores a value of 10 in the first element of the msg array. A variant of the Send() method is used to send an element of the msg array to the receiver process.

The sender process calls the Send() method on line 27. The first three arguments are related to the data being sent. The sending bu!er—the bu!er array—is the first argument followed by 0 (o!set) and 1 (count). The data being sent is of MPI.INT type and the destination is 1 (peer variable); the datatype and destination are specified as fourth and fifth argument to the Send() method. The last and the sixth argument is the tag variable. A tag is used to identify messages at the receiver side. A message tag is typically an identifier of a particular message in a specific communicator. On the other hand the receiver process (rank 1) receives the message using the blocking receive method.

Adil Butt
@Adil Butt : Can u please say how can i implement it?
Alex Mathew
sure.I assume you have downloaded the multicore version of mpj-express.I have added a code snippet in my answer now. I have the documents for the API. you can pm me if you want those.
Adil Butt
@Adil Butt : How can i PM you?? can you please gave me ur email address?
Alex Mathew
Whats your comment about JPPF and hadoop?
Alex Mathew
Yes I found out there is no functionality of PM. Which is rather bad.mail me at [email protected]
Adil Butt
+1  A: 

Read the section ón threads in the java tutorial. http://download-llnw.oracle.com/javase/tutorial/essential/concurrency/procthread.html

Thorbjørn Ravn Andersen
@AnderSen : Ok think that i had developed a app which have threads,how can it be operated from two systems?
Alex Mathew
A central master knowing what needs to be done, and a horde of slaves which gets work from the master and reports back?
Thorbjørn Ravn Andersen
+1  A: 

You might want to check out Hadoop. It's designed to have jobs running over an arbitrary amount of boxes and takes care of all the bookkeeping for you. It's inspired by Google's MapReduce and their related tools and so it even comes from web indexing.

Nicolas78
A: 

This is the parallel programming resource I've been pointed to in the past:

http://www.jppf.org/

I have no idea whether its any good or not, just that someone recommended it a while ago.

Richard
A: 

You want to look at the Java Parallel Processing Framework (JPPF)

John Channing
+1  A: 

java.util.concurrency package and the Brian Goetz book "Java concurrency in practice"

There is also a lot of resources here about parallel patterns by Ralph Johnson (one of the GoF design pattern author) : http://parlab.eecs.berkeley.edu/wiki/patterns/patterns

Bruno Thomas
A: 

Is the Ateji PX parallel-for loop what you're looking for ? This will crawl all sites in parallel (notice the double bar next to the for keyword) :

for||(Site site : sites) {
  crawl(site);
}

If you need to compose the results of crawling, then you'll probably want to use a parallel comprehension, such as :

Set result = set for||{ crawl(site) | Site site : sites }

Further reading here : http://www.ateji.com/px/whitepapers/Ateji%20PX%20for%20Java%20v1.0.pdf

Patrick Viry
+1  A: 

You can have a look at Hadoop and Hadoop Wiki.This is an apache framework inspired by google's map-reduce.It enables you to do distributed computing using multiple systems.Many companies like Yahoo,Twitter use it(Sites Powered By Hadoop).Check this book for more information on how to use it Hadoop Book.

Emil
@Emil : i need to know more about Hadoop,Can you help me out?
Alex Mathew
@alex:The truth is I just came across this framework 2day's before.So I don't have much idea about how to use it.I started reading the book (the one I gave u as link).I think it can give a good start.Any way if you have any problems feel free to ask.If I know I'll help.You can check out this link in SO with hadoop tags.http://stackoverflow.com/questions/tagged/hadoop .This might help you to understand problems faced by hadoop users in general.
Emil
@alex:Also from what I understand what you need is distribute the load of a system to many machines.I don't think by learning about thread or concurrency in java will help you to achieve this.I think hadoop is the only opensource,stable framework which allows you to do this as of now.It might take some time to learn a new framework but it's always better than 'reinventing the wheel'.
Emil
@alex:Cloudera is a distribution of hadoop.It helps you to configure your systems without much manual work(http://www.cloudera.com/).
Emil
Is the book you linked to on RapidShare pirated??
Amir Afghani