[Wien] k-point parallelization in WIEN2K_09.1

Peter Blaha pblaha at theochem.tuwien.ac.at
Mon Jun 14 07:03:28 CEST 2010


I do NOT believe that k-point parallel with an older WIEN2k was possible
(unless you set it up with "rsh" instead of "ssh" and defined a .rhosts file).

Anyway, k-parallel does not use mpi at all and you have to read the requirements
specified in the UG.

Kakhaber Jandieri schrieb:
> Dear Prof. Blaha,
> 
> Thank you for your reply.
> 
>> Can you    ssh node120 ps
>> without supplying a password ?
> 
> No, I can't ssh the nodes without password supply, but in my 
> parallel_options I have setenv MPI_REMOTE 0. I thought that our cluster 
> has a shared memory architecture, since the MPI-parallelization works 
> without any problem for 1 k-point. I cheeked the corresponding nodes. 
> All they were loaded. May be I misunderstood something. Are the 
> requirements for MPI-parallelization different from that for k-point 
> paralleization?
> 
>> Try x lapw1 -p on the commandline.
>> What exactly is the "error" ?
> 
> Just now, to try your suggestions, I ran new task with k-point 
> parallelization. The .machines file is:
> granularity:1
> 1:node120
> 1:node127
> 1:node121
> 1:node123
> 
> with node120 as a master node.
> 
> The output of x lapw -p is:
> starting parallel lapw1 at Sun Jun 13 22:44:08 CEST 2010
> ->  starting parallel LAPW1 jobs at Sun Jun 13 22:44:08 CEST 2010
> running LAPW1 in parallel mode (using .machines)
> 4 number_of_parallel_jobs
> [1] 31314
> [2] 31341
> [3] 31357
> [4] 31373
> Permission denied, please try again.
> Permission denied, please try again.
> Received disconnect from 172.26.6.120: 2: Too many authentication 
> failures for kakhaber
> [1]    Done                   ( ( $remote $machine[$p]  ...
> Permission denied, please try again.
> Permission denied, please try again.
> Received disconnect from 172.26.6.127: 2: Too many authentication 
> failures for kakhaber
> Permission denied, please try again.
> Permission denied, please try again.
> Received disconnect from 172.26.6.121: 2: Too many authentication 
> failures for kakhaber
> [3]  - Done                   ( ( $remote $machine[$p]  ...
> [2]  - Done                   ( ( $remote $machine[$p]  ...
> Permission denied, please try again.
> Permission denied, please try again.
> Received disconnect from 172.26.6.123: 2: Too many authentication 
> failures for kakhaber
> [4]    Done                   ( ( $remote $machine[$p]  ...
>      node120(1)      node127(1)      node121(1)      node123(1) **  
> LAPW1 crashed!
> cat: No match.
> 0.116u 0.324s 0:11.88 3.6%        0+0k 0+864io 0pf+0w
> error: command   /home/kakhaber/WIEN2K_09/lapw1cpara -c lapw1.def   failed
> 
>> How many k-points do you have ? ( 4 ?)
> 
>  Yes, I have 4 k-points.
> 
>> Content of .machine1 and .processes
> 
> marc-hn:~/wien_work/GaAsB> cat .machine1 node120 
> marc-hn:~/wien_work/GaAsB> cat .machine2
> node127
> marc-hn:~/wien_work/GaAsB> cat .machine3
> node121
> marc-hn:~/wien_work/GaAsB> cat .machine4
> node123
> 
> marc-hn:~/wien_work/GaAsB> cat .processes
> init:node120
> init:node127
> init:node121
> init:node123
> 1 : node120 :  1 : 1 : 1
> 2 : node127 :  1 : 1 : 2
> 3 : node121 :  1 : 1 : 3
> 4 : node123 :  1 : 1 : 4
> 
>> While x lapw1 -p is running, do a    ps -ef |grep lapw
> 
> I had not enough time to do it - the program crashed before.
> 
>> Your .machines file is most likely a rather "useless" one. The mpi-lapw1
>> diagonalization (SCALAPACK) is almost a factor of 2 slower than the 
>> serial
>> version, thus your speedup by using 2 processors in mpi-mode will be
>> very small.
> 
> Yes, I know, but I am simply trying to arrange the calculations using 
> Wien2K. For "real" calculations I will use much more processors.
> 
> And finally, for additional information. As I wrote in my previous 
> letters, in
> WIEN2k_08.1 k-point parallelization works, but all processes are running 
> on master node and all other reserved nodes are idle. I forgot to 
> mention: this is true for lapw1 only. Lapw2 is distributed among all 
> reserved nodes.
> 
> Thank you one again. I am looking forward for your further advices.
> 
> 
> Dr. Kakhaber Jandieri
> Department of Physics
> Philipps University Marburg
> Tel:+49 6421 2824159 (2825704)
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 
-----------------------------------------
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-----------------------------------------


More information about the Wien mailing list