[Wien] runafm and parallel execution
Stefaan Cottenier
Stefaan.Cottenier at fys.kuleuven.ac.be
Fri Oct 22 14:12:38 CEST 2004
Dear all,
I noticed that runafm_lapw is not adapted for parallel execution on a system
with distributed scratch disks. The problem is in the section:
copyvec:
foreach i ( ${scratch}*.vectorup* )
if (! -z $i) then
set j=`echo $i:e | cut -d _ -f 2- -s`
if ( $j ) then
cp $i ${scratch}$file.vectordn_$j
cp $file.energyup_$j $file.energydn_$j
echo $file.vectordn_$j copied >> $dayfile
else
cp $i ${scratch}$file.vectordn
cp $file.energyup $file.energydn
echo $file.vectordn copied >> $dayfile
endif
endif
end
First of all, if in $SCRATCH also vector files from other cases are present,
the value of i gets way too large (replace second line by "foreach i (
${scratch}$file.vectorup* )" ?).
Apart from that, if the scratch space is distributed over different local
disks, then there are less than the total number of vector files in the
scratch space of the single node on which this test is done. It would be
better to count local files, e.g. the energy files (replace second line by
"foreach i ( $file.energyup* )", with appropriate changes in the use of $i
later on ?).
Finally, for the same reason the copying of the vector files happens only
for the ones that are on the local scratch disk. Probably the copying should
be done by "rcp" on the remote scratch disks. For this to be possible, I
guess there will be needed also information on the .machines file in
runafm_lapw, which involves some more implementation work (I'm not so fluent
in script language and don't feel able to make a suggestion, sorry...).
Are these the only changes that are needed to make runafm work in such
cases?
Another remark: when I initialize a case for use with runafm (with
case.struct_supergroup), during init_lapw I get an empty case.inclmcopy_st.
Is that normal? (the UG doesn't tell something about this file). There is
something that looks like an error in the screen output, but I'm not sure:
> afminput (13:37:01) case.struct_supergroup present
The super and subgroups are TRANSLATIONENGLEICH
Found a symmetry operation:
0 -1 0 0.00000
1 0 0 0.00000
0 0 1 0.00000
FORTRAN STOP rrot not found <==============================
0.006u 0.007s 0:00.16 0.0% 0+0k 0+0io 83pf+0w
You can now use runafm_lapw for scf
Thanks,
Stefaan
More information about the Wien
mailing list