[Wien] machines file

Peter Blaha pblaha at zeus.theochem.tuwien.ac.at
Wed Apr 6 20:09:24 CEST 2005


We had quite some discussions on mpi parallel jobs:

a) It does not make much sense to mpi parallelize on a dual node.
   You will not gain much, neither speed nor memory.
   Only from 4 hosts on it makes "sense", and most likely you may want a 
   fast network (infiniband, myrinet).
b) Do you have WIEN2k experience ? If not, forget the mpi version for the 
   moment.
b) Do you have an mpi and sclapack installed ?
   If yes, check again your compile.msg in eg. SRC_lapw0


>  Could somebody help me on .machine file?
> 
>  We have just combined the Wien2k for MPI. We tested this with TiC
> 
>  Our system information is :
> Dual processors AMD 64 Opteron
> OS: Fedora Core 1 x86_64
> Host name: Darwin
> Nodes: Opto0xx ; per node has 2 CPU with the same speed
> 
> The Wien2k Version is 25/02/2005
> 
> We have files: lapw1para, lapwsopara, lapwdmpara, lapw2para, lapw0_mpi,
> lapw1_mpi, and Lapw2_mpi
> 
> For TiC, We tested with 72- k-points
> 
> our machines file is
> 
> granularity:1
> 18:opto024:2
> 18:opto025:2
> 36:opto030:2
> lapw0:opto024 opto025
> 
> By using  paratest we obtained:
> 
> Test: LAPW1 in parallel mode (using .machines)
> Granularity set to 1
> Extrafine unset
> 
>     klist:       72
>     machines:    opto024 opto025 opto030
>     procs:       3
>     weigh(old):  18 18 36
>     sumw:        72
>     granularity: 1
>     weigh(new):  18 18 36
> 
> Distribution of k-point (under ideal conditions)
> will be:
> 
> 1 : opto024(18) 18k
> 2 : opto025(18) 18k
> 3 : opto030(36) 36k
> 
>  By using this machine file when we ran SCF calculations we met the following
> error:
> 
> cycle 1         (Wed Apr  6 17:34:32 SGT 2005)  (20/20 to go)
> 
> >   lapw0 -p    (17:34:32) starting parallel lapw0 at Wed Apr  6 17:34:32 SGT
> 2005
> -------- .machine1 : 2 processors
> opto024
> opto025
> --------
> 0.020u 0.010s 0:00.06 50.0%     0+0k 0+0io 2588pf+0w
> 
> >   stop error
> 
> When I check STDOUT: I found
> 
> 1[1]: No match.
> 
> I checked files and I saw the following files were not generated:  case.vsp,
> case.vns, and empty files case.clmup/dn
> 
> But when I use the following .machine file
> 
> granularity:1
> 18:opto024:2
> 18:opto025:2
> 36:opto030:2
> 
> LAPW0 ran well, but LAWP1 crashed.
> 
> The show dayfile is
> 
>       cycle 1   (Wed Apr  6 18:05:39 SGT 2005)  (20/20 to go)
> 
> >   lapw0 -p    (18:05:39) starting parallel lapw0 at Wed Apr  6 18:05:39 SGT
> 2005
> --------
> running lapw0 in single mode
> 1.960u 0.030s 0:02.13 93.4%     0+0k 0+0io 2502pf+0w
> >   lapw1  -p   (18:05:41) starting parallel lapw1 at Wed Apr  6 18:05:41 SGT
> 2005
> ->  starting parallel LAPW1 jobs at Wed Apr  6 18:05:41 SGT 2005
> running LAPW1 in parallel mode (using .machines)
> 3 number_of_parallel_jobs
> **  LAPW1 crashed!
> 0.030u 0.040s 0:05.34 1.3%      0+0k 0+0io 14149pf+0w
> 
> >   stop error
> 
> STDOUT
> 
> STOP  LAPW0 END
>  STOP
> bash: line 1: lapw1: command not found
> 
> real    0m0.003s
> user    0m0.000s
> sys     0m0.000s
> bash: line 1: lapw1: command not found
> 
> real    0m0.002s
> user    0m0.000s
> sys     0m0.000s
> bash: lapw1: command not found
> 
> real    0m0.002s
> user    0m0.000s
> sys     0m0.001s
> cat: No match.
> 
> We really do not understand the reason for these errors. But we notice that if
> we run in remote mode, i.e., we use the machine file for host Darwin
> 
> granularity:1
> 18:darwin
> 18:darwin
> 36:darwin
> 
> then everything was ok.
> 
> Could someone help us to clarify these errors and how to overcome this? Many
> thanks.
> 
> Best regards,
> Khuong
> 


                                      P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671             FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.at    WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------




More information about the Wien mailing list