[Wien] lapw1cpara script WIEN2k_08.1

aurelio aurelio at cesga.es
Wed Feb 27 19:46:24 CET 2008


Hello,
I am using WIEN2k_08.1 to run in parallel a case with 23 k points using 23(k point)*4(mpi fine grain) processors in a 
Linux SUSE Linux Enterprise Server 10 (ia64) cluster. The program fail in lapw1cpara unable to start mpirun correctly 
complainig about the machinefile in some cases and running multiple k points in the same node in others.
Checking the script lapw1cpara I have notice the following:

In lines 492-495:

set helpout=`cat .machine[$p]`
echo -n "$helpout(${kpl[$loop]}) " >.time1_$loop
set ttt=(`echo $mpirun | sed -e "s^_NP_^$number_per_job[$p]^" -e "s^_EXEC_^$WIENROOT/${exe}_mpi ${def}_$loop.def^" -e 
"s^_HOSTS_^.machine[$p]^"`)

we can see how the machine file is got from the regular expression expansion .machine[$p]. That means, in the case, for 
example, the 10th k point calculation, that .machine[$p] is expanded by the shell to ".machine1" and for example for the 
12th k point to .machine1  .machine2.
It is quite easy to test in the run folder:

$ ls .machine[10]
.machine1
$ ls .machine[12]
.machine1  .machine2

This final result in a wrong machinefile for the calculation. I think the solution is to change .machine[$p] for .machine$p.

What do you think?

Thanks in advance for your help

Aurelio Rodríguez


-- 
__________________________________
Manuel Aurelio Rodriguez Lopez
Tecnico de Aplicaciones
CESGA
Avda. de Vigo s/n. Campus Sur
15705 - Santiago de Compostela.
SPAIN
Tel.: +34 981 56 98 10 Ext 244
Fax: +34 981 59 46 16
e-mail: aurelio at cesga.es
__________________________________


More information about the Wien mailing list