[Wien] =?iso-2022-jp?B?UmU6IFtXaWVuXSAubWFjaGluZXMgZmlsZSBmb3IgZmluZSBncmFpbmVkIHBhcmFsbGVsIGV4ZWN1dGlvbnM=?=

tom_y at livedoor.com tom_y at livedoor.com
Wed Aug 20 08:54:13 CEST 2003


Dear wien2k developers and users,

I got a reply from prof. Blaha on this topic a week ago. 
I'm sorry that I could not send a reply, because I was away from e-mails.

I tried to check whether the comments by prof. Blaha can 
solve our problem or not. (The answer is No.)

Again, followings are my .machine file for fine grained parallel 
execution and waht I did and got.
> -----------------------------------
> lapw0:earth46:1 earth47:1 earth48:1
> 3:earth46:1 earth47:1 earth48:1
> granularity:1
> -----------------------------------
>
> By invoking the command "run_lapw -p"
> I got the following messange on the scrren
>
> FORTRAN STOP  LAPW0 END
> FORTRAN STOP  LAPW0 END
> FORTRAN STOP  LAPW0 END
> FORTRAN STOP  LAPW0 END
> cat: No match.

As porf. Blaha pointed out, 4 lapw0 lines were my fault when I copy
and paste them on the e-mail window from the linux screen.
Only 3 lapw0 lines appear with error message as follows.

FORTRAN STOP  LAPW0 END
FORTRAN STOP  LAPW0 END
FORTRAN STOP  LAPW0 END
cat: No match.

> A few suggestions:
> a) Make sure you do have a   case.output0   file. Eventually copy
> cp case.output0000 case.output0
> (New versions of run_lapw should not need case.output0 anymore)

There was no case.output0 file. Then I copied case.output0000 to case.output0,
but nothig changed.

> b) your .machines file looks reasonable.
> Try the steps individually:
> 
> x lapw0 -p
> x lapw1 -p (-c)
> x lapw2 -p (-c)

When I tried to run step by step, I got following messages on the screen.

# x lapw0 -p
starting parallel lapw0 at Wed Aug 20 14:55:57 JST 2003
-------- .machine1 : 3 processors
earth46:1
earth47:1
earth48:1
--------
FORTRAN STOP  LAPW0 END
FORTRAN STOP  LAPW0 END
FORTRAN STOP  LAPW0 END
2.080u 0.110s 0:06.00 36.5%     0+0k 0+0io 11164pf+0w

#x lapw1 -p
starting parallel lapw1 at Wed Aug 20 14:56:45 JST 2003
->  starting parallel LAPW1 jobs at Wed Aug 20 14:56:45 JST 2003
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
[1] 5632
[1]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$
p] ) >> .time1_$loop
**  LAPW1 crashed!
cat: No match.
0.020u 0.080s 0:03.61 2.7%      0+0k 0+0io 10659pf+0w

I don't understand these messages. Why LAPW1 crashed?

I also tried to call directly lapw0_mpi and lapw1_mpi by invoking following
commands.

#mpirun -np 3 -machinefile m /home/wien2k/WIEN2k_03_3/SRC_lapw0/lapw0_mpi lapw0.def
FORTRAN STOP  LAPW0 END
FORTRAN STOP  LAPW0 END
FORTRAN STOP  LAPW0 END

#mpirun -np 3 -machinefile m /home/wien2k/WIEN2k_03_3/SRC_lapw1/
lapw1_mpi lapw1.def
 Using            3  processors, My ID =            0
 Using            3  processors, My ID =            2
 Using            3  processors, My ID =            1
BLACS ERROR 'Illegal grid (0 x 16), #procs=3'
from {-1,-1}, pnum=1, Contxt=-1, on line -1 of file 'BLACS_GRIDINIT/BLACS_GRIDMA
P'.

p1_3483:  p4_error: : 1
[1] MPI Abort by user Aborting program !
[1] Aborting program!
Broken pipe

--------------------
Content of the file "m" is
earth46
earth47
earth48
--------------------



More information about the Wien mailing list