views:

194

answers:

2

Hi,

I am trying to run a program on a computer cluster. The structure of the program is the following:

PROGRAM something
...
CALL subroutine1(...)
...
END PROGRAM

SUBROUTINE subroutine1(...)
...
DO i=1,n
CALL subroutine2(...)
ENDDO
...
END SUBROUTINE

SUBROUTINE subroutine2(...)
...
CALL subroutine3(...)
CALL subroutine4(...)
...
END SUBROUTINE

The idea is to parallelize the loop that calls subroutine2. Main program basically only makes the call to subroutine1 and only its arguments are declared. I use two alternatives. On the one hand, I write OpenMP clauses arround the loop. On the other hand, I add an IF conditional branch arround the call and I use MPI to share the results.

In the OpenMP case, I add CALL KMP_SET_STACKSIZE(402653184) at the beginning of the main program and I can run it with 8 threads on an 8 core machine. When I run it (on the same 8 core machine) with MPI (either using 8 or 1 processors) it crashes just when makes the call to subroutine3 with a segmentation fault (signal 11) error. If I comment subroutine4, then it doesn't crash (notice that it crashed just when calling subroutine3 and it works when commenting subroutine4).

I compile with mpif90 using MPICH2 libraries and the following flags: -O3 -fpscomp logicals -openmp -threads -m64 -xS. The machine has EM64T architecture and I use a Debian Linux distribution. I set ulimit -s hard before running the program.

Any ideas on what is going on? Has it something to do with stack size?

Thanks in advance


Let me try to add some information in case it helps. The main purpose of the main program is to call subroutine1. Subroutine1 is the only one that makes calls to MPI/OpenMP functions. This subroutine is a Quasi-Newton minimization that I am probably allowed to share... so here it is its code:

module Quasi_Newton

implicit none
private
public :: subroutine1

contains

subroutine subroutine1(p,nop,np,move,maxiter,stopcr)

use themodule
use omp_lib
implicit none
include 'mpif.h'

integer, intent(in) :: nop,np,move(nop),maxiter !Number of parameters, number of parameters to be moved, list of the parameters to be moved and max iterations
real(8), intent(inout) :: p(nop) !Parameter vector
real(8), intent(in) ::  stopcr !Stopping criterion
integer :: iter, ipar, jpar, impar, jmpar, iproc !Counters
real(8) :: df(np), prevdf(np), ddf(np,np), tempdir(np,1), dir(nop) !Gradient, Gradient at previous iteration, Hessian, temporary allocation of direction vector and direction vector
real(8) :: y(np,1), identity(np,np), outer(np,np) !Gradient at xn1-gradient at xn, identity and outer matrix used to update the hessian
real(8) :: an, an1, adf, addf !Factor that multiplies the direction of movement, updated factor, first and second differentials with respect to an
real(8) :: func0,func1,func2,func3,func4,func5 !Function evaluations
real(8) :: p0(nop),p1(nop),p2(nop),p3(nop),p4(nop),p5(nop) !Vectors of parameters to be evaluated
real(8) :: xn(nop), xn1(nop) !Current and updated guesses of parameters
real(8), parameter :: eps=0.0002, c1=0.0001, c2=0.9 !Epsilon for the differentials and c1 and c2 parameters of the Wolfe conditions
logical ::  matrix_ill !Logical matrix used in the inverse to detect wrong situations
real :: currtime !Variable to store the time used

!******* variables related mpi *******

integer :: ierror !return error message from mpi subroutines
integer :: id     !identification number of each processor
integer :: nproc  !total number of processors
integer:: tag     !used for tag of send/recv operations
integer,dimension(mpi_status_size):: mpi_status !used to store status of mpi_recv



    !******* initialization *******

    !initialization of mpi environment

    call mpi_init(ierror)

    !obtain id for each node

    call mpi_comm_rank(mpi_comm_world, id, ierror)

    !returns the number of nodes

    call mpi_comm_size(mpi_comm_world, nproc, ierror)

    if (id==0) write (3,1000)
    if (id==0) print 1100

!! FIRST ITERATION: CALCULATE JACOBIAN AT THE INITIAL GUESS OF PARAMETERS AND GUESS THE HESSIAN EQUAL I

    xn=p

  !! Compute the gradient
    if (id==0) print 2100
    impar=1
!   !$omp parallel do num_threads(1) default(none) private(p1,p2,func1,func2,impar,currtime,ipar) shared(xn,df,nop,algorithm,globaltime,move)
    do ipar=1,nop
        if (move(ipar)==0) cycle
        if ((sum(move(1:ipar))>=id*floor(1.*np/nproc)+1+id*(np-nproc*floor(1.*np/nproc)>=id)+&
            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id)).and.&
            (sum(move(1:ipar))<=(id+1)*floor(1.*np/nproc)+(id+1)*(np-nproc*floor(1.*np/nproc)>=id+1)&
            +(np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id+1))) then 
            p1=xn
            p2=xn
            p1(ipar)=xn(ipar)+eps
            call subroutine2(p1,func1,1)
            currtime=secnds(globaltime)
            print 2110, ipar, floor(currtime/3600) ,floor((currtime-floor(currtime/3600)*3600)/60),currtime-floor(currtime/60)*60, func1
            p2(ipar)=xn(ipar)-eps
            call subroutine2(p2,func2,1)
            currtime=secnds(globaltime)
            print 2111, ipar, floor(currtime/3600) ,floor((currtime-floor(currtime/3600)*3600)/60),currtime-floor(currtime/60)*60, func2
            impar=sum(move(1:ipar))
            df(impar)=(func1-func2)/(2*eps)
        endif
    enddo
!    !$omp end parallel do
    call mpi_barrier(mpi_comm_world,ierror)
    if (id==0) then
        do iproc=1,nproc-1
            tag=iproc+100
            call mpi_recv(df(iproc*floor(1.*np/nproc)+1+iproc*(np-nproc*floor(1.*np/nproc)>=iproc)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<iproc)),&
                            min((iproc+1)*floor(1.*np/nproc)+(iproc+1)*(np-nproc*floor(1.*np/nproc)>=iproc+1)&
                            +(np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<iproc+1)&
                            ,np-iproc*floor(1.*np/nproc)+1+iproc*(np-nproc*floor(1.*np/nproc)>=iproc)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<iproc))&
                            ,mpi_double_precision,iproc,tag,mpi_comm_world,mpi_status,ierror)
        enddo
    else
        tag=id+100
        call mpi_send(df(id*floor(1.*np/nproc)+1+id*(np-nproc*floor(1.*np/nproc)>=id)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id)),&
                            min((id+1)*floor(1.*np/nproc)+(id+1)*(np-nproc*floor(1.*np/nproc)>=id+1)&
                            +(np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id+1)&
                            ,np-id*floor(1.*np/nproc)+1+id*(np-nproc*floor(1.*np/nproc)>=id)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id))&
                            ,mpi_double_precision,0,tag, mpi_comm_world,ierror)
    endif
    call mpi_bcast(df(1),np,mpi_double_precision,0,mpi_comm_world,ierror)
    prevdf=df
    currtime=secnds(globaltime)
    if (id==0) print 9000, floor(currtime/3600) ,floor((currtime-floor(currtime/3600)*3600)/60),currtime-floor(currtime/60)*60

  !! Initial guess of the hessian (identity matrix)

    identity=0
    impar=1
    do ipar=1,nop
        if (move(ipar)==0) cycle
        identity(impar,impar)=1.
        impar=impar+1
    enddo
    ddf=identity


 !! MAIN LOOP: DIRECTION-ALPHA-UPDATE PARAMETERS-UPDATE HESSIAN

    iter=1
    Main_loop: do iter=1,maxiter

      !! Find the direction of minimization

        tempdir(:,1)=-1.*matmul(ddf,df)
        impar=1
        do ipar=1,nop
            if (move(ipar)==0) then
                dir(ipar)=0
              else
                dir(ipar)=tempdir(impar,1)
                impar=impar+1
            endif
        enddo

      !! Line minimization

        an=1
        if (id==0) then
            p0=p
            call subroutine2(p0,func0,1)
        endif
         if (id==min(nproc-1,1)) then
            p1=xn+an*dir
            call subroutine2(p1,func1,1)
        endif
        call mpi_bcast(func0,1,mpi_double_precision,0,mpi_comm_world,ierror)
        call mpi_bcast(func1,1,mpi_double_precision,min(nproc-1,1),mpi_comm_world,ierror)
        call mpi_barrier(mpi_comm_world,ierror)
        if (id==0) then
            currtime=secnds(globaltime)
            write (3,3000) iter,func0, p0
            print 3100, iter, floor(currtime/3600) ,floor((currtime-floor(currtime/3600)*3600)/60),currtime-floor(currtime/60)*60, func0, func1
        endif
        Line_min: do
            if (id==0) then
                p2=xn+(an+eps)*dir
                call subroutine2(p2,func2,1)
            endif
            if (id==min(nproc-1,1)) then
                p3=xn+(an-eps)*dir
                call subroutine2(p3,func3,1)
            endif
            if (id==min(nproc-1-1*(nproc==2),2)) then
                p4=xn+(an+2*eps)*dir
                call subroutine2(p4,func4,1)
            endif
            if (id==min(nproc-1,3)) then
                p5=xn+(an-2*eps)*dir
                call subroutine2(p5,func5,1)
            endif
            call mpi_bcast(func2,1,mpi_double_precision,0,mpi_comm_world,ierror)
            call mpi_bcast(func3,1,mpi_double_precision,min(nproc-1,1),mpi_comm_world,ierror)
            call mpi_bcast(func4,1,mpi_double_precision,min(nproc-1-1*(nproc==2),2),mpi_comm_world,ierror)
            call mpi_bcast(func5,1,mpi_double_precision,min(nproc-1,3),mpi_comm_world,ierror)
            call mpi_barrier(mpi_comm_world,ierror)
            adf=(func2-func3)/(2*eps)
            addf=(func4-2*func1+func5)/(4*(eps**2))
            an1=an-adf/addf
            an=an1
            if (id==0) then
                p1=xn+an*dir
                call subroutine2(p1,func1,1)
            endif
            call mpi_bcast(func1,1,mpi_double_precision,0,mpi_comm_world,ierror)
            call mpi_barrier(mpi_comm_world,ierror)
            if (func1>func0+c1*an*dot_product(tempdir(:,1),prevdf)) cycle Line_min
            if (id==0) then
                currtime=secnds(globaltime)
                print 4100, floor(currtime/3600) ,floor((currtime-floor(currtime/3600)*3600)/60),currtime-floor(currtime/60)*60, func1
            endif
            impar=1
!     !$omp parallel do num_threads(1) default(none) private(p3,p4,func3,func4,impar,currtime,ipar) shared(xn,df,an,dir,y,nop,algorithm,globaltime,prevdf,move)
            do ipar=1,nop
                if (move(ipar)==0) cycle
                if ((sum(move(1:ipar))>=id*floor(1.*np/nproc)+1+id*(np-nproc*floor(1.*np/nproc)>=id)+&
                    (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id)).and.&
                    (sum(move(1:ipar))<=(id+1)*floor(1.*np/nproc)+(id+1)*(np-nproc*floor(1.*np/nproc)>=id+1)&
                    +(np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id+1))) then 
                    p3=xn+an*dir
                    p4=xn+an*dir
                    p3(ipar)=xn(ipar)+an*dir(ipar)+eps
                    call subroutine2(p3,func3,1)
                    currtime=secnds(globaltime)
                    print 2110, ipar, floor(currtime/3600) ,floor((currtime-floor(currtime/3600)*3600)/60),currtime-floor(currtime/60)*60, func3
                    p4(ipar)=xn(ipar)+an*dir(ipar)-eps
                    call subroutine2(p4,func4,1)
                    currtime=secnds(globaltime)
                    print 2111, ipar, floor(currtime/3600) ,floor((currtime-floor(currtime/3600)*3600)/60),currtime-floor(currtime/60)*60, func4
                    impar=sum(move(1:ipar))
                    y(impar,1)=(func3-func4)/(2*eps)-prevdf(impar)
                    df(impar)=(func3-func4)/(2*eps)
                endif
            enddo
!       !$omp end parallel do

            call mpi_barrier(mpi_comm_world,ierror)
            if (id==0) then
                do iproc=1,nproc-1
                    tag=iproc+500
                    call mpi_recv(df(iproc*floor(1.*np/nproc)+1+iproc*(np-nproc*floor(1.*np/nproc)>=iproc)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<iproc)),&
                            min((iproc+1)*floor(1.*np/nproc)+(iproc+1)*(np-nproc*floor(1.*np/nproc)>=iproc+1)&
                            +(np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<iproc+1)&
                            ,np-iproc*floor(1.*np/nproc)+1+iproc*(np-nproc*floor(1.*np/nproc)>=iproc)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<iproc))&
                            ,mpi_double_precision,iproc,tag,mpi_comm_world,mpi_status,ierror)
                    tag=iproc+5000
                    call mpi_recv(y(iproc*floor(1.*np/nproc)+1+iproc*(np-nproc*floor(1.*np/nproc)>=iproc)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<iproc),1),&
                            min((iproc+1)*floor(1.*np/nproc)+(iproc+1)*(np-nproc*floor(1.*np/nproc)>=iproc+1)&
                            +(np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<iproc+1)&
                            ,np-iproc*floor(1.*np/nproc)+1+iproc*(np-nproc*floor(1.*np/nproc)>=iproc)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<iproc))&
                            ,mpi_double_precision,iproc,tag,mpi_comm_world,mpi_status,ierror)
                enddo
            else
                tag=id+500
                call mpi_send(df(id*floor(1.*np/nproc)+1+id*(np-nproc*floor(1.*np/nproc)>=id)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id)),&
                            min((id+1)*floor(1.*np/nproc)+(id+1)*(np-nproc*floor(1.*np/nproc)>=id+1)&
                            +(np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id+1)&
                            ,np-id*floor(1.*np/nproc)+1+id*(np-nproc*floor(1.*np/nproc)>=id)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id))&
                            ,mpi_double_precision,0,tag, mpi_comm_world,ierror)
                tag=id+5000
                call mpi_send(y(id*floor(1.*np/nproc)+1+id*(np-nproc*floor(1.*np/nproc)>=id)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id),1),&
                            min((id+1)*floor(1.*np/nproc)+(id+1)*(np-nproc*floor(1.*np/nproc)>=id+1)&
                            +(np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id+1)&
                            ,np-id*floor(1.*np/nproc)+1+id*(np-nproc*floor(1.*np/nproc)>=id)+&
                            (np-nproc*floor(1.*np/nproc))*(np-nproc*floor(1.*np/nproc)<id))&
                            ,mpi_double_precision,0,tag, mpi_comm_world,ierror)
            endif
            call mpi_bcast(df(1),np,mpi_double_precision,0,mpi_comm_world,ierror)
            call mpi_bcast(y(1,1),np,mpi_double_precision,0,mpi_comm_world,ierror)
            call mpi_barrier(mpi_comm_world,ierror)
            if (abs(dot_product(tempdir(:,1),df))<=c2*abs(dot_product(tempdir(:,1),prevdf))) exit Line_min
        enddo Line_min
        prevdf=df
        if (id==0) then
            currtime=secnds(globaltime)
            print 9000, floor(currtime/3600) ,floor((currtime-floor(currtime/3600)*3600)/60),currtime-floor(currtime/60)*60
        endif

      !! Update parameters

        xn1=xn+an1*dir
        p=xn1
        if (maxval(abs(xn1-xn))<=stopcr) exit Main_loop
        xn=xn1

      !! Update the hessian

        outer=identity-(1/dot_product(y(:,1),an1*tempdir(:,1)))*matmul(y,transpose(an1*tempdir))
        ddf=matmul(matmul(transpose(outer),ddf),outer)+(1/dot_product(y(:,1),an1*tempdir(:,1)))*matmul(an1*tempdir,transpose(an1*tempdir))


    enddo Main_loop


    !******* finalize mpi environment *******

    call mpi_finalize(ierror)

1000 FORMAT (' BROYDEN_FLETCHER_GOLDFARB_SHANNO...' /&
        ' Iteration    Func.value.         Parameter values')
3000 FORMAT ('  ', i6, '    ', f17.8, '         ', 105f20.10)

1100 FORMAT (' BROYDEN_FLETCHER_GOLDFARB_SHANNO...')
2100 FORMAT ('    Computing the gradient at x0...')
2110 FORMAT ('                     First evaluation of parameter  ', i3, ' at ',i4,':',i3,':',f6.2,' with value function ', f17.8)
2111 FORMAT ('                     Second evaluation of parameter ', i3, ' at ',i4,':',i3,':',f6.2,' with value function ', f17.8)
3100 FORMAT ('    Iteration ', i6, ' started at ',i4,':',i3,':',f6.2,' with value functions ', f17.8, f17.8)
4100 FORMAT ('           First Wolfe criterion met at ',i4,':',i3,':',f6.2, ' with value function', f17.8)
9000 FORMAT ('           ... done at ',i4,':',i3,':',f6.2)


end subroutine

end module

Moreover, notice the following (strange) things:

  • I switch OpenMP and MPI by uncommenting omp lines and commenting mpi calls (and some if's)
  • The first (input) argument of subroutine2 is the vector of parameters; the second (output) is the evaluation of the function to be minimized (scalar) and the third one is a flag that determines the size of some arrays in subroutine2.
  • The code works properly with OpenMP (with 1 and 8 threads in an 8 core machine with 16G of ram).
  • It doesn't work with MPI even if I comment all MPI calls and I give all processes id=0 (so that all of them do the same). And it even doesn't work if I run with mpiexec -np 1 (i.e. with a single process), either with mpi calls commented and uncommented
  • Subroutine 1 works perfectly (either with OpenMP and MPI) to minimize the following function:

    subroutine subroutine2(p,h,n1) use readdata implicit none integer, parameter :: npar=105 real(8), intent(in) :: p(npar) real(8), intent(out) :: h integer, intent(in) :: n1

    h=sum(p*2)+sum(p*4)

    end subroutine

  • I am not able to reproduce the error with simpler versions of the program

I hope it helps. Let me know about further details that may be necessary.


The post became that long that it doesn't allow me to insert the code that I found out to replicate the error. Please, see the answer I posted below

A: 

Well, if you are only going to tease us and show us your private code bit by bit, I'm going to tease you in return.

  • What are you doing to ensure, in the MPI version, that each process has local copies of all the variables it needs ?
High Performance Mark
Please see the edition above. Of course I don't want to tease you! I repeat I am really sorry that I don't put the code but it is confidential, and anyway I am not so sure that adding more than 10000 lines of code would help. And, since I have no idea where the problem comes from, I am not able to disentangle which is the is the relevant information to provide. I appreciate so much that you help me on this.
Bellman
@Bellman, fraid it beats me and there's only so much time I spend on SO in a day or week. But the code you've posted doesn't include any calls to subroutine3 which is, according to your question, where the MPI version crashes. I also observe that you don't check the value of ierror after MPI calls -- when grasping at straws it's worth doing, though I can't say I have much expectation that it will help here.
High Performance Mark
I did further experiments trying to obtain a simple example that mimics the behavior of those subroutines (not finished yet, I will post it on a while). Anyway I achieved the conclusion that it should have something to do with the stack, since, at some point, no matter what you add, it drives to the segmentation fault. On the original post I talked about CALL KMP_SET_STACKSIZE(402653184) because it is strange that even same code, it runs with ./something and it doesn't with mpiexec -np 1 ./something. Therefore, I guess it should be something equivalent for MPI that I'm missing
Bellman
A: 

The post became so long that it doen't allow me to post the full edit. Here it is...

I hope this new edit helps in clarifying the issue. I finally came out with a simple program that replicates the error. Here it comes:

module themodule

integer ::  algorithm 
real ::  globaltime 

integer, parameter ::  modnpar=105, modproc=8

integer, parameter ::  moddim1=-40, moddim2=107, moddim3=2, moddim4=4, moddim5=2
integer, parameter ::  moddim6=50, moddim7=4, moddim8=2, moddim9=67, moddim10=93

integer ::  gvar2, gvar3, gvar4, gvar5, gvar6 

real ::  gvar7(moddim1:moddim2), gvar8(moddim1:moddim2), gvar9(moddim1:moddim2), gvar10(moddim1:moddim2,moddim6,moddim8,moddim5) 
real ::  gvar11(moddim1:moddim2), gvar12(moddim1:moddim2,moddim6,moddim5), gvar13(moddim6,moddim5), gvar14(moddim1:moddim2)
integer ::  gvar15(moddim1:moddim2,8,moddim5,moddim8)
real ::  gvar16(0:2,0:2,moddim6,moddim5,moddim7,moddim1:moddim2), gvar17(moddim9:moddim2,moddim5,moddim6/5,moddim4-1) 
real ::  gvar18(moddim9:moddim2,moddim5,moddim7,moddim4-1), gvar19(moddim9:moddim2,moddim5,0:2,moddim4-1)
real ::  gvar20(moddim10:moddim2,moddim5,moddim8,moddim4-1), gvar21(moddim10:moddim2,moddim5,5,moddim4-1) 
real ::  gvar22(21,moddim5,0:1,moddim4-1), gvar23(10,moddim5,0:1,moddim4-1), gvar24(moddim5,5,5) 
real ::  gvar25(moddim5,5,5), gvar26(moddim9:moddim2,moddim5,moddim6/5,moddim3) 
real ::  gvar27(moddim9:moddim2,moddim5,moddim7,moddim3), gvar28(moddim10:moddim2,moddim5,moddim8,moddim3) 
real ::  gvar29(moddim10:moddim2,moddim5,5,moddim3), gvar30(moddim5,5,5,moddim3), gvar31(moddim5,5,5,moddim3)
real ::  gvar32(moddim9:moddim2,moddim5,moddim3,moddim3), gvar33(moddim9:moddim2,moddim5,moddim6/5,moddim3)
real ::  gvar34(moddim10:moddim2,moddim5,moddim8,moddim3),gvar35(moddim10:moddim2,moddim5,5,moddim3)
real ::  gvar36(moddim9:moddim2,moddim5,moddim7,moddim3), gvar37(moddim10:moddim2,moddim5,moddim8,moddim3)
real ::  gvar38(moddim9:moddim2,moddim6/5,moddim5,moddim7-1), gvar39(moddim10:moddim2,moddim6/5,moddim8,moddim5,moddim7-1)
real ::  gvar40(moddim9:moddim2,moddim5,moddim4,moddim4-1), gvar41(moddim9:moddim2,moddim5,moddim4-(moddim3-1),moddim4-(moddim3-1)-1)
real ::  gvar42(moddim9:moddim2,moddim5,moddim6/5,moddim4,moddim4-1), gvar43(moddim10:moddim2,moddim5,moddim8,moddim4,moddim4-1)
real ::  gvar44(moddim10:moddim2,moddim5,moddim4-1), gvar45(moddim10:moddim2,moddim5,5,moddim4,moddim4-1) 
real ::  gvar46(moddim5,0:13-1), gvar47(moddim5,0:7-1), gvar48(moddim5,0:13-1), gvar49(moddim5,0:7-1), gvar50(moddim5,0:13-1) 
real ::  gvar51(moddim5,0:7-1), gvar52(moddim9:moddim2,moddim5,moddim6/5,moddim4-1), gvar53(moddim9:moddim2,moddim5,moddim7,moddim4-1) 
real ::  gvar54(moddim9:moddim2,moddim5,0:2,moddim4-1), gvar55(moddim10:moddim2,moddim5,moddim8,moddim4-1) 
real ::  gvar56(moddim10:moddim2,moddim5,5,moddim4-1), gvar57(21,moddim5,0:1,moddim4-1), gvar58(10,moddim5,0:1,moddim4-1)
real ::  gvar59(moddim5,5,5), gvar60(moddim5,5,5), gvar61(moddim9:moddim2,moddim5,moddim6/5,moddim3) 
real ::  gvar62(moddim9:moddim2,moddim5,moddim7,moddim3), gvar63(moddim10:moddim2,moddim5,moddim8,moddim3) 
real ::  gvar64(moddim10:moddim2,moddim5,5,moddim3), gvar65(moddim5,5,5,moddim3), gvar66(moddim5,5,5,moddim3) 
real ::  gvar67(moddim9:moddim2,moddim5,moddim3,moddim3), gvar68(moddim9:moddim2,moddim5,moddim6/5,moddim3) 
real ::  gvar69(moddim10:moddim2,moddim5,moddim8,moddim3), gvar70(moddim10:moddim2,moddim5,5,moddim3)
real ::  gvar71(moddim9:moddim2,moddim5,moddim7,moddim3),gvar72(moddim10:moddim2,moddim5,moddim8,moddim3)
real ::  gvar73(moddim9:moddim2,moddim6/5,moddim5,moddim7-1), gvar74(moddim10:moddim2,moddim6/5,moddim8,moddim5,moddim7-1)
real ::  gvar75(moddim9:moddim2,moddim5,moddim4,moddim4-1), gvar76(moddim9:moddim2,moddim5,moddim4-(moddim3-1),moddim4-(moddim3-1)-1) 
real ::  gvar77(moddim9:moddim2,moddim5,moddim6/5,moddim4,moddim4-1), gvar78(moddim10:moddim2,moddim5,moddim8,moddim4,moddim4-1)
real ::  gvar79(moddim10:moddim2,moddim5,moddim4-1), gvar80(moddim10:moddim2,moddim5,5,moddim4,moddim4-1) 
real ::  gvar81(moddim5,0:13-1), gvar82(moddim5,0:7-1), gvar83(moddim5,0:13-1),gvar84(moddim5,0:7-1) 
real ::  gvar85(moddim5,0:13-1), gvar86(moddim5,0:7-1) 


end module themodule

program something
use Quasi_Newton
use themodule
implicit none

integer, parameter :: npar=105
real(8), parameter ::  stopcr=0.001
integer, parameter ::  maxfn=10000
real(8) :: p(npar)
integer :: whichpar(npar)


    call kmp_set_stacksize(402653184)

    algorithm=0
    globaltime=secnds(0.0)

    open(3,file='Output.txt')

    p=1.

    whichpar=1

    call subroutine1(p,npar,sum(whichpar),whichpar,maxfn,stopcr)


end program


subroutine subroutine2(p,h,n1)

use omp_lib
use themodule
implicit none

real :: lvar1, lvar2

integer, parameter ::  npar=105, dim1=-40, dim2=107, dim3=2, dim4=4, dim5=2, dim6=2, dim7=50, dim8=4, dim9=2
integer, parameter ::  dim10=29, dim11=16, dim12=2000, dim13=2000, dim14=dim12+dim13, dim15=67, dim16=79, dim17=96
integer, parameter ::  dim18=93, dim19=90, dim20=93, dim21=103, dim22=106, dim23=15, dim24=17, dim25=8, dim26=10
integer, intent(in) ::  n1

integer ::  t, a, o, g, j, is, ipar, i1, i, k

real(8), intent(inout) ::  p(npar)
real ::  p1, p2, p3, p4, p5, p6, p7(2), p8(dim3,2*(dim3+1)+2), p9(dim3,dim5), p10(dim4,dim6,dim9,dim5)
real ::  p11(dim6,dim9,dim5,0:1), p12(7,dim5), p13(dim4,dim5), p14, p15(2), p16, p17(7,4)

real ::  svar1(dim1:dim2,dim3,n1),svar2(dim1:dim2,n1),svar3(dim7,dim11,dim4,dim5,dim6,dim9)

integer ::  svar4(dim14,dim7), svar5(dim14,dim7), svar6(dim14,dim7), svar7(dim14,dim7,dim3)
integer ::  svar8(dim14,dim7), svar9(dim14,dim7), svar10(dim14,dim7) 

real ::  svar11(dim15:dim2,dim5,dim9,dim8,dim4), svar12(dim15:dim2,dim5,dim9,dim8,dim3), svar13(dim15:dim2,dim5,dim9,dim8)
real ::  svar14(dim15:dim2,dim5+1,dim9+1), svar15(dim1:dim2,dim3), svar16(dim15:dim2,dim5,dim7,dim4), svar17(dim15:dim2,dim5,dim8,dim4)
real ::  svar18(dim15:dim2,dim5,0:2,dim4), svar19(dim18:dim2,dim5,dim9,dim4), svar20(dim18:dim2,dim5,5,dim4), svar21(dim16:99,3,dim5,0:1,dim4)
real ::  svar22(dim17:106,5,dim5,0:1,dim4), svar23(dim16:94,dim5,5,5,dim3), svar24(dim17+2:106,dim5,5,5,dim3), svar25(dim15:dim2,dim5,dim7,dim3)
real ::  svar26(dim15:dim2,dim5,dim8,dim3), svar27(dim18:dim2,dim5,dim9,dim3), svar28(dim18:dim2,dim5,5,dim3), svar29(dim16:94,dim5,5,5,dim3)
real ::  svar30(dim17+2:106,dim5,5,5,dim3), svar31(dim15:dim2,dim5,dim3,dim3), svar32(dim15:dim2,dim5,2:dim7,dim3,dim3)
real ::  svar33(dim18:dim2,dim5,dim9,dim3,dim3), svar34(dim18:dim2,dim5,5,dim3,dim3), svar35(dim15:dim2,dim5,dim8,dim3) 
real ::  svar36(dim18:dim2,dim5,dim9,dim3), svar37(dim15:dim2,dim7,dim5,dim8), svar38(dim18:dim2,dim7,dim9,dim5,dim8) 
real ::  svar39(dim15:dim2,dim5,dim4,dim4), svar40(dim15:dim2,dim5,dim4-(dim3-1),dim4-(dim3-1)), svar41(dim15:dim2,dim5,dim7,dim4,dim4)
real ::  svar42(dim18:dim2,dim5,dim9,dim4,dim4), svar43(dim18:dim2,dim5,dim7,dim4), svar44(dim18:dim2,dim5,5,dim4,dim4)
real ::  svar45(dim19:dim20,dim23:dim24,dim5,0:13), svar46(dim21:dim22,dim25:dim26,dim5,0:7), svar47(dim19:dim20,dim23:dim24,dim5,0:13)
real ::  svar48(dim21:dim22,dim25:dim26,dim5,0:7), svar49(dim19:dim20,dim23:dim24,dim5,0:13), svar50(dim21:dim22,dim25:dim26,dim5,0:7) 

integer :: svar51
real(8), intent(out) ::  h

    ipar=0
    p1=exp(p(ipar+1))/(1+exp(p(ipar+1)))
    ipar=ipar+1

    p2=exp(p(ipar+1))/(1+exp(p(ipar+1)))
    ipar=ipar+1

    p3=p(ipar+1)/10
    ipar=ipar+1

    p4=p(ipar+1)/10
    ipar=ipar+1

    p5=exp(p(ipar+1))/(1+exp(p(ipar+1)))
    ipar=ipar+1

    p6=exp(p(ipar+1))/(1+exp(p(ipar+1)))
    ipar=ipar+1

    p7=p(ipar+1:ipar+2)*10000
    ipar=ipar+2

    do o=1,dim3
        p8(o,:)=p(ipar+1:ipar+2*(dim3+1)+2)
        ipar=ipar+2*(dim3+1)+2
    enddo
    p8(1,1)=p8(1,1)/100
    p8(1,2)=p8(1,2)/100
    p8(1,3)=p8(1,3)/100
    p8(1,4)=p8(1,4)/1000
    p8(1,5)=p8(1,5)/1000
    p8(1,6)=p8(1,6)/10
    p8(1,7)=p8(1,7)/1000
    p8(1,8)=p8(1,8)/100
    p8(2,1)=p8(2,1)/10
    p8(2,2)=p8(2,2)/100
    p8(2,3)=p8(2,3)/100
    p8(2,4)=p8(2,4)/1000
    p8(2,5)=p8(2,5)/1000
    p8(2,6)=p8(2,6)/10
    p8(2,7)=p8(2,7)/100
    p8(2,8)=p8(2,8)/100

    do g=1,dim5
        p9(:,g)=p(ipar+1:ipar+dim3)
        ipar=ipar+dim3
    enddo
    p9=p9*10000

    p10(1:2,1,1,1)=0
    p10(3:4,1,1,1)=p(ipar+1:ipar+2)
    ipar=ipar+2
    p10(:,2,1,1)=p(ipar+1:ipar+dim4)
    ipar=ipar+dim4
    do k=1,dim6
        p10(:,k,2,1)=p(ipar+1:ipar+dim4)
        ipar=ipar+dim4
    enddo
    p10(:,1,1,2)=p(ipar+1:ipar+dim4)
    ipar=ipar+dim4
    p10(1:2,2,1,2)=p10(1:2,1,1,2)+(p10(1:2,2,1,1)-p10(1:2,1,1,1))
    p10(3:4,2,1,2)=p10(3:4,1,1,2)*p10(3:4,2,1,1)/p10(3:4,1,1,1)
    do k=1,dim6
        p10(1:2,k,2,2)=p10(1:2,1,1,2)+(p10(1:2,k,2,1)-p10(1:2,1,1,1))
        p10(3:4,k,2,2)=p10(3:4,1,1,2)*p10(3:4,k,2,1)/p10(3:4,1,1,1)
    enddo
    p10(1:2,:,:,:)=p10(1:2,:,:,:)/10
    p10(3:4,:,:,:)=p10(3:4,:,:,:)*10000

    do i=0,1; do is=1,dim9; do g=1,dim5; do k=1,dim6-1
        p11(k,is,g,i)=exp(p(ipar+1))/(1+exp(p(ipar+1)))
        ipar=ipar+1
    enddo; enddo; enddo; enddo
    forall (i=0:1,is=1:dim9,g=1:dim5) p11(dim6,is,g,i)=1-sum(p11(1:dim6-1,is,g,i))

    p12(1,1)=0
    p12(1,2)=p(ipar+1)
    ipar=ipar+1
    do i=2,7
        p12(i,:)=p(ipar+1:ipar+dim5)
        ipar=ipar+dim5
    enddo
    p12=p12*10000
    p12(5,:)=p12(5,:)/100000

    do g=1,dim5
        p13(:,g)=p(ipar+1:ipar+dim4)
        ipar=ipar+dim4
    enddo
    p13(1:2,:)=exp(p13(1:2,:))/10
    p13(3:4,:)=exp(p13(3:4,:))*1000


    p14=2*exp(p(ipar+1))/(1+exp(p(ipar+1)))-1
    ipar=ipar+1

    p15=p(ipar+1:ipar+2)
    ipar=ipar+2

    p16=p(ipar+1)
    ipar=ipar+1

    p17=0
    p17(1:6,1)=p(ipar+1:ipar+6)
    ipar=ipar+6
    p17(1:6,2)=p(ipar+1:ipar+6)
    ipar=ipar+6
    p17(:,3)=p(ipar+1:ipar+7)
    ipar=ipar+7
    p17(:,4)=p(ipar+1:ipar+7)
    ipar=ipar+7

    do i1=1,n1
        svar3=0

print *, 'hello1'
        call subroutine3(p1, p2, p3, p4, p5, p6, p7, p8, p10, p12, p13, p14, p17, p15, p16, svar3)

print *, 'hello2'
        call subroutine4(i1,svar4,svar5,svar6,svar7,svar8,svar9,svar10,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14&
        ,svar3,p15,p17,p16,svar1,svar2,n1,svar51,svar16,svar17,svar18,svar19,svar20,svar21,svar22,svar23,svar24,svar25&
        ,svar26,svar27,svar28,svar29,svar30,svar31,svar32,svar33,svar34,svar35,svar36,svar37,svar38,svar39,svar40,svar41&
        ,svar42,svar43,svar44,svar45,svar46,svar47,svar48,svar49,svar50,svar11,svar12,svar13,svar14,svar15)

    enddo

h=sum(p**2)+sum(p**4)

end subroutine subroutine2

subroutine subroutine3(p1, p2, p3, p4, p5, p6, p7, p8, p10, p12, p13, p14, p17, p15, p16, svar3)
use themodule
use omp_lib
implicit none

 integer, parameter ::  dim1=-40, dim2=107, dim7=50, dim8=4, dim3=2, dim4=4, dim5=2, dim6=2, dim9=2, dim27=2000, dim28=1500, dim11=16, dim29=dim11-1

 integer ::  o, c, g, k, a, is, c1, i, c2

 real, intent(in) ::  p1, p2, p3, p4, p5, p6, p7(2), p8(dim3,2*(dim3+1)+2), p10(dim4,dim6,dim9,dim5)
 real, intent(in) ::  p12(7,dim5), p13(dim4,dim5), p14, p15(2), p16, p17(7,4)

 real ::  svar52(0:2,0:2,dim7,dim5,dim8,dim1:dim2), svar53(dim1:dim2), svar54(dim27,dim4,dim5,dim6,dim9), svar55(dim27,dim11)

 real ::  lvar1(dim27,dim4,dim5,dim6,dim9), lvar2(dim27,dim4,dim4,dim5,dim6,dim9), lvar3(dim27,dim4,dim5,dim6,dim9)
 real ::  lvar4(dim29,dim29), lvar5(dim11,dim11), lvar6(dim11,dim27)

 real, allocatable ::  lvar7(:,:,:,:,:), lvar8(:,:,:,:,:)

 real, intent(out) ::  svar3(dim7,dim11,dim4,dim5,dim6,dim9)

 integer ::  svar56(dim27), svar57(dim27), svar58(dim27), svar59(dim27), svar60(dim27,dim4), svar61(dim27,dim4), svar62(dim27,dim3,dim9)
 integer ::  svar63(dim27,dim3,dim9,dim4), svar64(dim27), svar65(dim27), svar67(dim27,dim5), svar68(dim27) 
 real ::  svar69(dim27,dim4), svar70(dim27), svar71(dim27), svar72(dim27), svar73(dim27,dim3), svar74(dim27,dim3), svar75(dim27,dim3)
 real ::  svar76(dim27,2), svar77(dim27,2), svar78(dim27,2), svar79(dim27), svar80(dim27), svar81(dim27,dim3), svar82(dim27,dim6,dim9,dim5,dim3)
 real ::  svar83(dim27,2), svar84(dim27,dim3), svar85(dim27), svar86(dim27), svar87(dim27) 

print *, 'hi1'

!$omp critical
    svar52=gvar16
    svar53=gvar11
!$omp end critical

    do a=dim7,2,(-1)

        call subroutine5(svar53, p1, p2, p3, p4, p5, svar56, svar59, svar57, svar60, svar58, svar61, svar64, &
                         svar62, svar63, svar65, svar68, svar71, svar72, svar75, svar74, svar78, svar77, svar80, svar86, svar87, a)


        lvar2=-10.
        lvar3=0.

        do c1=1,dim28

            call subroutine6(svar52, p17, p15, p16, p1, p2, p3, p4, p5, svar56, svar58, svar68, &
                        svar71, svar72, svar75, svar74, svar78, svar77, svar80, svar67, svar69, svar70, svar73, svar76, svar79, svar81, a)


            call subroutine7(svar52, p1, p2, p3, p4, p5, p17, p15, svar56+1, svar60, svar61, svar64, svar65, svar63, svar67, svar70, svar85, svar84, svar73, svar83, svar76, svar79, svar87, a+1, svar3, svar54)

        enddo

        if (a<=35) then
            do g=1,dim5
                call subroutine8(svar52, p1, p2, p3, p4, p5, p17, p15, svar56, svar57, svar58, svar64, svar65, svar62, svar68, svar71, svar72, svar75, svar74, svar78, svar77, svar80, svar86, a, svar55, g, 1)
            enddo

            do g=1,dim5
                call subroutine8(svar52, p1, p2, p3, p4, p5, p17, p15, svar56, svar57, svar58, svar64, svar65, svar62, svar68, svar71, svar72, svar75, svar74, svar78, svar77, svar80, svar86, a, svar55, g, 2)
            enddo

        elseif (a>35) then
            do g=1,dim5
                if (g==1) then
                    call subroutine8(svar52, p1, p2, p3, p4, p5, p17, p15, svar56, svar57, svar58, svar64, svar65, svar62, svar68, svar71, svar72, svar75, svar74, svar78, svar77, svar80, svar86, a, svar55, g, 1)
                endif
            enddo

            do g=1,dim5
                if (g==1) then
                        call subroutine8(svar52, p1, p2, p3, p4, p5, p17, p15, svar56, svar57, svar58, svar64, svar65, svar62, svar68, svar71, svar72, svar75, svar74, svar78, svar77, svar80, svar86, a, svar55, g, 2)
                endif
            enddo
        endif

        svar3=10.
    enddo

end subroutine subroutine3

subroutine subroutine4(i1,svar4,svar5,svar6,svar7,svar8,svar9,svar10,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14&
        ,svar3,p15,p17,p16,svar1,svar2,n1,svar51,svar16,svar17,svar18,svar19,svar20,svar21,svar22,svar23,svar24,svar25&
        ,svar26,svar27,svar28,svar29,svar30,svar31,svar32,svar33,svar34,svar35,svar36,svar37,svar38,svar39,svar40,svar41&
        ,svar42,svar43,svar44,svar45,svar46,svar47,svar48,svar49,svar50,svar11,svar12,svar13,svar14,svar15)
implicit none

integer, parameter ::  dim1=-40, dim2=107, dim3=2, dim4=4, dim5=2, dim6=2, dim7=50, dim8=4, dim9=2, dim11=16, dim12=2000
integer, parameter ::  dim13=2000, dim14=dim12+dim13, dim31=30, dim15=67, dim16=79, dim17=96, dim18=93, dim30=dim1, dim19=90 
integer, parameter ::  dim20=93, dim21=103, dim22=106, dim23=15, dim24=17, dim25=8, dim26=10

integer, intent(in) ::  n1, i1 

integer ::  t, o, c, g, a, e, is, i, j, c3, c4, ipar 

real, intent(in) ::  p1, p2, p3, p4,p5, p6, p7(2), p8(dim3,2*(dim3+1)+2), p9(dim3,dim5)
real, intent(in) ::  p10(dim4,dim6,dim9,dim5), p11(dim6,dim9,dim5,0:1), p12(7,dim5), p13(dim4,dim5), p14

real, intent(inout) ::  p15(2), p16, p17(7,4)

real, intent(out) ::  svar15(dim1:dim2,dim3)

real, intent(inout) ::  svar1(dim1:dim2,dim3,n1), svar2(dim1:dim2,n1) 
integer, intent(inout) ::  svar4(dim14,dim7), svar5(dim14,dim7), svar6(dim14,dim7), svar7(dim14,dim7,dim3)
integer, intent(inout) ::  svar8(dim14,dim7), svar9(dim14,dim7),  svar10(dim14,dim7)

real, intent(in) ::  svar3(dim7,dim11,dim4,dim5,dim6,dim9)

real, intent(out) ::  svar11(dim15:dim2,dim5,dim9,dim8,dim4), svar12(dim15:dim2,dim5,dim9,dim8,dim3), svar13(dim15:dim2,dim5,dim9,dim8)
real, intent(out) ::  svar14(dim15:dim2,dim5+1,dim9+1), svar16(dim15:dim2,dim5,dim7,dim4), svar17(dim15:dim2,dim5,dim8,dim4) 
real, intent(out) ::  svar18(dim15:dim2,dim5,0:2,dim4), svar19(dim18:dim2,dim5,dim9,dim4), svar20(dim18:dim2,dim5,5,dim4) 
real, intent(out) ::  svar21(dim16:99,3,dim5,0:1,dim4), svar22(dim17:106,5,dim5,0:1,dim4), svar23(dim16:94,dim5,5,5,dim3) 
real, intent(out) ::  svar24(dim17+2:106,dim5,5,5,dim3), svar25(dim15:dim2,dim5,dim7,dim3), svar26(dim15:dim2,dim5,dim8,dim3) 
real, intent(out) ::  svar27(dim18:dim2,dim5,dim9,dim3), svar28(dim18:dim2,dim5,5,dim3), svar29(dim16:94,dim5,5,5,dim3) 
real, intent(out) ::  svar30(dim17+2:106,dim5,5,5,dim3), svar31(dim15:dim2,dim5,dim3,dim3), svar32(dim15:dim2,dim5,2:dim7,dim3,dim3) 
real, intent(out) ::  svar33(dim18:dim2,dim5,dim9,dim3,dim3), svar34(dim18:dim2,dim5,5,dim3,dim3), svar35(dim15:dim2,dim5,dim8,dim3)
real, intent(out) ::  svar36(dim18:dim2,dim5,dim9,dim3), svar37(dim15:dim2,dim7,dim5,dim8), svar38(dim18:dim2,dim7,dim9,dim5,dim8) 
real, intent(out) ::  svar39(dim15:dim2,dim5,dim4,dim4), svar40(dim15:dim2,dim5,dim4-(dim3-1),dim4-(dim3-1)), svar41(dim15:dim2,dim5,dim7,dim4,dim4)
real, intent(out) ::  svar42(dim18:dim2,dim5,dim9,dim4,dim4), svar43(dim18:dim2,dim5,dim7,dim4), svar44(dim18:dim2,dim5,5,dim4,dim4) 
real, intent(out) ::  svar45(dim19:dim20,dim23:dim24,dim5,0:13), svar46(dim21:dim22,dim25:dim26,dim5,0:7), svar47(dim19:dim20,dim23:dim24,dim5,0:13)
real, intent(out) ::  svar48(dim21:dim22,dim25:dim26,dim5,0:7), svar49(dim19:dim20,dim23:dim24,dim5,0:13), svar50(dim21:dim22,dim25:dim26,dim5,0:7) 

integer ::  svar51

real ::  lvar1(2)

integer ::  lvar2(dim7,dim5), lvar3(dim14,dim7), lvar4(dim14,dim7), lvar5(dim14,dim7), lvar6(dim14,dim7,dim4), lvar7(dim14,dim7,dim4)
integer ::  lvar8(dim14,dim7,dim3), lvar9(dim14,dim7), lvar10(dim14,dim7,dim3,dim4), lvar11(dim14,dim7), lvar12(dim14,dim7) 
integer ::  lvar13(dim14,dim7), lvar14(dim14), lvar15(dim14), lvar16(dim14,dim7), lvar17(dim14,dim7)

real(8) ::  lvar18(dim14,dim7,dim4)
real ::  lvar19(dim1:dim2), lvar20(dim1:dim2), lvar21(dim1:dim2), lvar22(dim1:dim2,dim3,dim31+1)

real, parameter ::  lvar23=5 
real ::  lvar24(dim14,dim7), lvar25(dim14,dim7,dim3), lvar26(dim14,dim7,dim3), lvar27(dim14,dim7,dim3) 
real ::  lvar28(dim14,dim7,dim4), lvar29(dim14,dim7,dim4),  lvar32(dim14,dim7,dim4), lvar33

integer ::  lvar30(dim14,dim7), lvar31(dim14,dim7), lvar36

real, parameter ::  lvar34=0.01
integer, parameter :: lvar37=8.5, lvar38=1+size(p15)+size(p17)-2
real ::  lvar39(lvar38), lvar40(lvar38), lvar41(lvar38)

real(8), parameter :: lvar42=0, lvar43=1
real, parameter :: lvar44=0, lvar45=1
real :: lvar46(dim1:dim2,2), lvar47(dim1:dim2), lvar48(dim1:dim2,dim7,dim9,dim5),lvar49(dim1:dim2), lvar50(0:2,0:2,dim7,dim5,dim8,dim1:dim2)
real :: lvar51(dim1:dim2), lvar52(dim1:dim2,2),lvar53(dim1:dim2),lvar54(dim1:dim2,dim3)

print *, 'Hi2'

svar15=10.
svar1=10.
svar2=10.
svar4=10
svar5=10
svar6=10
svar7=10
svar8=10
svar9=10
svar10=10
svar11=10.
svar12=10.
svar13=10.
svar14=10.
svar16=10.
svar17=10.
svar18=10.
svar19=10.
svar20=10.
svar21=10.
svar22=10.
svar23=10.
svar24=10.
svar25=10.
svar26=10.
svar27=10.
svar28=10.
svar29=10.
svar30=10.
svar31=10.
svar32=10.
svar33=10.
svar34=10.
svar35=10.
svar36=10.
svar37=10.
svar38=10.
svar39=10.
svar40=10.
svar41=10.
svar42=10.
svar43=10.
svar44=10.
svar45=10.
svar46=10.
svar47=10.
svar48=10.
svar49=10.
svar50=10.

end subroutine subroutine4

subroutine subroutine5(svar53, p1, p2, p3, p4, p5, svar56, svar59, svar57, svar60, svar58, svar61, svar64, &
                         svar62, svar63, svar65, svar68, svar71, svar72, svar75, svar74, svar78, svar77, svar80, svar86, svar87, a)
implicit none

integer, parameter ::  dim1=-40,dim2=107, dim3=2, dim9=2, dim27=2000, dim4=4

integer, intent(in) ::  a

integer ::  c2, o, c

integer, intent(out) ::  svar56(dim27), svar59(dim27), svar57(dim27), svar60(dim27,dim4), svar58(dim27), svar61(dim27,dim4)
integer, intent(out) ::  svar64(dim27), svar62(dim27,dim3,dim9), svar63(dim27,dim3,dim9,dim4), svar65(dim27), svar68(dim27)
real, intent(out) ::  svar71(dim27), svar72(dim27), svar74(dim27,dim3), svar75(dim27,dim3), svar77(dim27,2), svar78(dim27,2)
real, intent(out) ::  svar80(dim27), svar86(dim27), svar87(dim27)

real, intent(in) ::  p1, p2, p3, p4, p5, svar53(dim1:dim2)

real ::  lvar1(dim27), lvar2(dim27), lvar3(dim27), lvar4(dim27), lvar5(dim27,dim3+1), lvar6(dim27), lvar7(dim27), lvar8(dim27), lvar9(dim27)

real, parameter :: lvar10=0., lvar11=1.


svar56=10
svar59=10
svar57=10
svar60=10
svar58=10
svar61=10
svar64=10
svar62=10
svar63=10
svar65=10
svar68=10
svar71=10.
svar72=10.
svar74=10.
svar75=10.
svar77=10.
svar78=10.
svar80=10.
svar86=10.
svar87=10.


end subroutine subroutine5


subroutine subroutine6(svar52, p17, p15, p16, p1, p2, p3, p4, p5, svar56, svar58, svar68, &
                svar71, svar72, svar75, svar74, svar78, svar77, svar80, svar67, svar69, svar70, svar73, svar76, svar79, svar81, a)
implicit none

integer, parameter ::  dim1=-40, dim2=107, dim7=50, dim8=4, dim5=2, dim9=2, dim4=4, dim3=2, dim27=2000

integer, intent(in) ::  a 
integer ::  c2, c, o, g, i

integer, intent(in) ::  svar56(dim27), svar58(dim27), svar68(dim27)
real, intent(in) ::  svar71(dim27), svar72(dim27), svar74(dim27,dim3), svar75(dim27,dim3), svar78(dim27,2), svar80(dim27), svar77(dim27,2)

integer, intent(out) ::  svar67(dim27,dim5)
real, intent(out) ::  svar69(dim27,dim4), svar70(dim27), svar73(dim27,dim3), svar76(dim27,2), svar79(dim27), svar81(dim27,dim3)

real, intent(in) ::  p1, p2, p3, p4, p5, p15(2), p16, p17(7,4), svar52(0:2,0:2,dim7,dim5,dim8,dim1:dim2)

real ::  lvar1(dim27)
real(8) ::  lvar2(dim27,dim4), lvar3(dim27), lvar4(dim27)

real, parameter ::  lvar5=0.9559713, lvar6=0.00925

real(8), parameter :: lvar7=0., lvar8=1.
real, parameter :: lvar9=0., lvar10=1.
real :: lvar11(dim27), lvar12(dim27)

svar67=10
svar69=10.
svar70=10.
svar73=10.
svar76=10.
svar79=10.
svar81=10.


end subroutine subroutine6

subroutine subroutine7(svar52, p1, p2, p3, p4, p5, p17, p15, svar56, svar57, svar58, svar64, svar65, svar62, svar68, svar71, svar72, svar75, svar74, svar78, svar77, svar80,svar86, a, svar3, svar54)
implicit none

integer, parameter ::  dim1=-40, dim2=107, dim7=50, dim8=4, dim5=2, dim3=2, dim4=4, dim6=2, dim9=2, dim27=2000, dim11=16 

integer, intent(in) :: a
integer, intent(in) :: svar56(dim27), svar64(dim27), svar65(dim27), svar68(dim27,dim5), svar62(dim27,dim3,dim9,dim4), svar57(dim27,dim4), svar58(dim27,dim4)
real, intent(in) ::  svar71(dim27), svar72(dim27), svar74(dim27,dim3), svar75(dim27,dim3), svar77(dim27,2), svar78(dim27,2)
real, intent(in) ::  svar80(dim27), svar86(dim27), svar3(dim7,dim11,dim4,dim5,dim6,dim9), svar52(0:2,0:2,dim7,dim5,dim8,dim1:dim2)

real, intent(in) ::  p1, p2, p3, p4, p5, p15(2), p17(7,4)
real, intent(out) ::  svar54(dim27,dim4,dim5,dim6,dim9) 
integer :: iss, i, j, o, c, is, g, c2 
real :: lvar1(dim27,dim11), lvar2(dim27,2), lvar3(dim27,dim3), lvar4(dim27), lvar5(dim27), lvar6(dim27), lvar7(dim27,dim3), lvar8(dim27)
real, parameter ::  lvar9=0.9559713

svar54=10

end subroutine subroutine7

subroutine subroutine8(svar52, p1, p2, p3, p4, p5, p17, p15, svar56, svar57, svar58, svar64, svar65, svar62, svar68, svar71, svar72, svar75, svar74, svar78, svar77, svar80, svar86, a, svar55, g, is)
implicit none

integer, parameter ::  dim1=-40, dim2=107, dim7=50, dim8=4, dim5=2, dim3=2, dim9=2, dim27=2000, dim11=16

integer, intent(in) ::  a, is, g

integer, intent(in) ::  svar56(dim27), svar57(dim27), svar58(dim27), svar62(dim27,dim3,dim9), svar68(dim27), svar64(dim27), svar65(dim27)
real, intent(in) ::  svar71(dim27), svar77(dim27,2), svar80(dim27), svar74(dim27,dim3), svar72(dim27), svar75(dim27,dim3), svar78(dim27,2)

real, intent(in) ::  svar52(0:2,0:2,dim7,dim5,dim8,dim1:dim2), svar86(dim27) 

real, intent(in) ::  p1,p2,p3,p4,p5, p15(2), p17(7,4) 

real :: lvar1(dim27,2), lvar2(dim27,dim3), lvar3(dim27), lvar4(dim27), lvar5(dim27), lvar6(dim27,dim3), lvar7(dim27)
real, parameter ::  lvar8=0.9559713

real, intent(out) :: svar55(dim27,dim11)

integer :: iss, j, o, c2, i 


    svar55=10.

end subroutine subroutine8

This piece of code together with the code of subroutine1 posted above replicates the error I get. Moreover, if I comment a piece of code (a call to a subroutine, or one the definitions of some variable such lvar2 and lvar3 at the beginning of subroutine2) then the program works fine. In sum, if I run the code with mpiexec -np 1 ./aaa it doesn't work and if I simply ./aaa then it works.

Thanks again for your help and sorry for the so long post!

Bellman
Realistically you're very unlikely to gen an answer to a question like this. Basically you're asking someone to look at a really big peice of code and just magically figure out what's wrong. Might happen but unlikely.If I were you I'd try and break the problem down into something much smaller and ask a much more specific question.
Ade Miller