views:

414

answers:

3

Here's sequential code:

do i = 1, n
   do j = i+1, n
      if ("some_condition(i,j)") then
         result = "here's result"
         return
      end if
   end do
end do

Is there a cleaner way to execute iterations of the outer loop concurrently other than:

  !$OMP PARALLEL private(i,j)
  !$OMP DO 
  do i = 1, n     
     !$OMP FLUSH(found)
     if (found) goto 10
     do j = i+1, n        
        if ("some_condition(i,j)") then
           !$OMP CRITICAL
           !$OMP FLUSH(found)
           if (.not.found) then           
              found = .true.
              result = "here's result"
           end if
           !$OMP FLUSH(found)
           !$OMP END CRITICAL
           goto 10
        end if
     end do
10   continue
  end do
  !$OMP END DO NOWAIT
  !$OMP END PARALLEL

The order of iterations over i-loop may be arbitrary as long as some result is found (it doesn't matter if it changes from run to run as long as it satisfies "some_condition").

A: 

It seems $OMP DO doesn't allow break out of the loop earlier. An alternative might be to implement it by hand.

Give each thread fixed continuous range of indices to process

Following Guide into OpenMP: Easy multithreading programming for C++:

  results = "invalid_value"

  !$OMP PARALLEL private(i,j,thread_num,num_threads,start,end)

  thread_num = OMP_GET_THREAD_NUM()
  num_threads = OMP_GET_NUM_THREADS()
  start = thread_num * n / num_threads + 1
  end = (thread_num + 1) * n / num_threads

  outer: do i = start, end
     !$OMP FLUSH(found)             
     if (found) exit outer
     do j = i+1, n
        if ("some_condition") then
           found = .true.
           !$OMP FLUSH(found)
           results(thread_num+1) = "here's result"
           exit outer
        end if
     end do
  end do outer

  !$OMP END PARALLEL

  ! extract `result` from `results` if any
  do i = 1, size(results)
     if (results(i).ne."invalid_value") result = results(i)
  end do

UPDATE: replaced goto by exit, introduced results array based on @M. S. B.'s answer.

If solution exists this approach is faster then $OMP DO due to earlier exit.

Give each thread one iteration at a time to process

Using task directive (suggested by @High Performance Mark):

  !$OMP PARALLEL
  !$OMP SINGLE
  !$OMP TASK UNTIED
          ! "untied" allows other threads to generate tasks
  do i = 1, n ! i is private
     !$OMP TASK ! implied "flush"
     task:     do j = i+1, n ! i is firstprivate, j is private       
        if (found) exit task
        if ("some_condition(i,j)") then
           !$OMP CRITICAL
           result = "here's result" ! result is shared              
           found = .true.           ! found is shared
           !$OMP END CRITICAL ! implied "flush"
           exit task
        end if
     end do task
     !$OMP END TASK 
  end do 
  !$OMP END TASK
  !$OMP END SINGLE
  !$OMP END PARALLEL

This variant is 2 times faster on my tests than the version with the outer-loop.

J.F. Sebastian
+1  A: 

It seems that your sequential code has a dependency that makes it unsuitable to being made parallel. Suppose that there are multiple values of i & j that make "some condition" true -- then the order of execution of the i & j do loops determines which of these conditions is found first and sets the value of result, after which the return statement ends the search for additional cases i,j that "some condition" is true. In the sequential code, the do loops always execute in the same order, so the operation of the program is deterministic and identical values of i & j that make "some condition" true will always be found. In a concurrent version, various loops i execute in non-deterministic order, so that from run to run different values of i might be the first i-value that finds a true "some condition".

Perhaps you as a programmer know that there is only one value of i & j that results in a true "some condition"? In that case short-circuiting the execution would seem OK. But the OpenMP spec says that "No statement in the associated loops other than the DO statements can cause a branch out of the loops" so having the something in the inner loop abort the output loop isn't allowed. If it is the case that there is always only one true "some condition", you could just remove the "return" and waste CPU time by having threads look for "some condition" is true after the one case has been found. That might still be faster than a sequential program. With a scaler "result" variable, it still probably non-compliant, having an dependency on the order of execution. You could change it in to a "reduction", summing the result, or return result as 1-D array of dimension (n). If you need to find the smallest value of i that has "some condition" true, you could obtain that from an array result using the Fortran instrinsic function minloc.

A solution with many "flush" and "critical" directives may not be faster than the sequential version.

UPDATE: Based on the clarification that multiple results are possible and that any will do, one parallel method would be to return mutiple results and let sequential code pick one out -- make "result" into a 1D array rather than a scaler. You are allowed to short-circuit the inner j-loop because it is not "associated" with the "omp do" directive, so "result" need only be 1D, dimensioned according to the range of i. So something like this:

program test1

integer :: i, j
integer, parameter :: n = 10
integer, dimension (n) :: result

result = -999

!omp parallel default (shared) private (i, j)
!omp do
do i = 1, n
   inner: do j = i+1, n
      if ( mod (i+j,14) == 0 ) then
         result (i) = i
         exit inner
      end if
   end do inner
end do
!omp end do
!omp end parallel

write (*, *) 'All results'
write (*, *) result

write (*, *)
write (*, *) 'One result'
write (*, *) result ( maxloc (result, 1) )

end program test1
M. S. B.
Thanks for the explanation on why OpenMP doesn't allow to break out of the loop earlier. I've updated the question to clarify that any `result` that satisfies "some_condition" will do. btw, removing "flush" from the inner loop make the program 10 times faster.
J.F. Sebastian
@M. S. B.: It is expensive in my case to iterate over the inner loop without strict necessity. "exit inner" variant is ~150 times slower than emulated by hand variant from my answer http://stackoverflow.com/questions/2979760/how-to-break-out-of-a-nested-parallel-openmp-fortran-loop-idiomatically/2979886#2979886
J.F. Sebastian
+1  A: 

Another approach entirely would be to use the TASK construct which is part of OpenMP 3.0. What you seem to be trying to do is to divide your loops across threads, compute until any thread finds an answer, then have all threads stop. Trouble is, the necessity to have all threads check a shared flag is (a) killing your performance and (b) leading you into ugly loops with BREAKS and CYCLES.

I think @M.S.B.'s answer gives very good advice on how to adapt your existing approach. But, perhaps a more natural way of tackling the problem would be for the program to create a number of tasks (perhaps one for each iteration of your innermost loop) and to dispatch those to worker threads. Once any thread reports success all threads can be sent a finalisation task and your program can continue.

This would, of course, require more re-writing of your program and probably make sequential execution worse. It will definitely require that your implementation of OpenMP supports v3.0 of the standard.

And you may need more help in this area than I can manage, I've only just started playing with OpenMP TASKS myself.

High Performance Mark
You are correct: "find an answer; stop all threads" is a general idea.
J.F. Sebastian