[Wien] runafm and parallel execution

Stefaan Cottenier Stefaan.Cottenier at fys.kuleuven.ac.be
Fri Oct 22 14:12:38 CEST 2004


Dear all,

I noticed that runafm_lapw is not adapted for parallel execution on a system
with distributed scratch disks. The problem is in the section:

copyvec:
  foreach i ( ${scratch}*.vectorup* )
  if (! -z $i) then
    set j=`echo $i:e | cut -d _ -f 2- -s`
    if ( $j ) then
      cp $i ${scratch}$file.vectordn_$j
      cp $file.energyup_$j $file.energydn_$j
      echo $file.vectordn_$j copied >> $dayfile
    else
      cp $i ${scratch}$file.vectordn
      cp $file.energyup $file.energydn
      echo $file.vectordn copied >> $dayfile
    endif
  endif
  end

First of all, if in $SCRATCH also vector files from other cases are present,
the value of i gets way too large (replace second line by "foreach i (
${scratch}$file.vectorup* )" ?).

Apart from that, if the scratch space is distributed over different local
disks, then there are less than the total number of vector files in the
scratch space of the single node on which this test is done. It would be
better to count local files, e.g. the energy files (replace second line by
"foreach i ( $file.energyup* )", with appropriate changes in the use of $i
later on ?).

Finally, for the same reason the copying of the vector files happens only
for the ones that are on the local scratch disk. Probably the copying should
be done by "rcp" on the remote scratch disks. For this to be possible, I
guess there will be needed also information on the .machines file in
runafm_lapw, which involves some more implementation work (I'm not so fluent
in script language and don't feel able to make a suggestion, sorry...).

Are these the only changes that are needed to make runafm work in such
cases?

Another remark: when I initialize a case for use with runafm (with
case.struct_supergroup), during init_lapw I get an empty case.inclmcopy_st.
Is that normal? (the UG doesn't tell something about this file). There is
something that looks like an error in the screen output, but I'm not sure:

>   afminput    (13:37:01)  case.struct_supergroup present
 The super and subgroups are TRANSLATIONENGLEICH
 Found a symmetry operation:
   0  -1   0   0.00000
   1   0   0   0.00000
   0   0   1   0.00000
FORTRAN STOP rrot not found    <==============================
0.006u 0.007s 0:00.16 0.0%      0+0k 0+0io 83pf+0w
You can now use     runafm_lapw   for scf

Thanks,
Stefaan





More information about the Wien mailing list