[Wien] Problem with k-parallel

Peter Blaha pblaha at theochem.tuwien.ac.at
Wed Mar 9 13:44:53 CET 2016


Hi,

Yes, we had have recently also such a problem. It comes from slow disk I/O.

A fie like    /tmp/.tmp2.mpolak.50255

is a temporary file created by lapw2para and is used when we modify the 
lapw2_xx.def files by a couple of sed commands.

Because of this I've reduced the sed commands in my version of 
lapw2para. Unfortunately, I cannot post the script because it will not 
be compatible with your WIEN2k version, but I can tell you what we did 
and since then these errors did not show up anymore.

Please identify the following lines in your lapw2para and modify it like 
shown below:
...
#creating  def files
set i = 1
while ($i <= $maxproc)
#  if ($debug > 0) echo -n "$i "
   cp $def.def $tmp_dir/.tmp.$user.$$
   #subsituting in files:
   cat <<theend >$tmp_dir/.script.$user.$$
s/vectorso$dnup/&_$i/w $tmp_dir/.mist.$user.$$
s/vectorso$updn/&_$i/w $tmp_dir/.mist.$user.$$
s/vectordum$dnup/&_$i/w $tmp_dir/.mist.$user.$$
s/vectordum$updn/&_$i/w $tmp_dir/.mist.$user.$$
s/vector$dnup'/vector${dnup}_$i'/w $tmp_dir/.mist.$user.$$
s/vector$updn'/vector${updn}_$i'/w $tmp_dir/.mist.$user.$$
s/energyso$dnup/&_$i/w $tmp_dir/.mist.$user.$$
s/energyso$updn/&_$i/w $tmp_dir/.mist.$user.$$
s/energydum/&_$i/w $tmp_dir/.mist.$user.$$
s/energy$dnup'/energy${dnup}_$i'/w $tmp_dir/.mist.$user.$$
s/energy$updn'/energy${updn}_$i'/w $tmp_dir/.mist.$user.$$
s/\(weigh$dnup\)'/\1_$i'/w $tmp_dir/.mist.$user.$$
s/\(weigh$updn\)'/\1_$i'/w $tmp_dir/.mist.$user.$$
s/\(weightaverso$updn\)'/\1_$i'/w $tmp_dir/.mist.$user.$$
s/normso$dnup/&_$i/w $tmp_dir/.mist.$user.$$
s/normso$updn/&_$i/w $tmp_dir/.mist.$user.$$
s/output2${updn}$eece/&_$i/w $tmp_dir/.mist.$user.$$
s/clmval${updn}$eece/&_$i/w $tmp_dir/.mist.$user.$$
s/vrespval$updn/&_$i/w $tmp_dir/.mist.$user.$$
s/dmat$updn/&_$i/w $tmp_dir/.mist.$user.$$
s/scf2$updn/&_$i/w $tmp_dir/.mist.$user.$$
s/help$updn/&_$i/w $tmp_dir/.mist.$user.$$
s/almblm$updn/&_$i/w $tmp_dir/.mist.$user.$$

theend

   sed -f $tmp_dir/.script.$user.$$ $tmp_dir/.tmp.$user.$$ > 
$tmp_dir/.tmp1.$user.$$
   mv $tmp_dir/.tmp1.$user.$$ "$def"_$i.def

#  sed "s/vector_${i}dn_$i/vectordn_$i/" 
$tmp_dir/.tmp1.$user.$$>$tmp_dir/.tmp2.$user.$$
#  sed "s/vector_${i}up_$i/vectorup_$i/" 
$tmp_dir/.tmp2.$user.$$>$tmp_dir/.tmp1.$user.$$
#  sed "s/vector_${i}so_$i/vectorso_$i/" 
$tmp_dir/.tmp1.$user.$$>$tmp_dir/.tmp2.$user.$$
#  sed "s/energy_${i}up_$i/energyup_$i/" 
$tmp_dir/.tmp2.$user.$$>$tmp_dir/.tmp1.$user.$$
#  sed "s/energy_${i}dn_$i/energydn_$i/" 
$tmp_dir/.tmp1.$user.$$>$tmp_dir/.tmp2.$user.$$
#  sed "s/energy_${i}so_$i/energyso_$i/" 
$tmp_dir/.tmp2.$user.$$>$tmp_dir/.tmp1.$user.$$
#  sed "s/energyso_${i}dn_$i/energysodn_${i}/" 
$tmp_dir/.tmp1.$user.$$>$tmp_dir/.tmp2.$user.$$
#  sed "s/energy_${i}dum_$i/energydum_$i/" 
$tmp_dir/.tmp2.$user.$$>$tmp_dir/.tmp1.$user.$$
#  sed "s/vector_${i}so_${i}dn_$i/vectorsodn_$i/" 
$tmp_dir/.tmp1.$user.$$>$tmp_dir/.tmp2.$user.$$
#  sed "s/vector_${i}dum_${i}dn_$i/vectordumdn_$i/" 
$tmp_dir/.tmp2.$user.$$>"$def"_$i.def
   @ i ++
end

As you can see, all these additional sed commands are now commented, 
since after the modifications of the .script.xx file they are not needed 
anymore.


On 03/08/2016 03:44 PM, Maciej Polak wrote:
> Dear WIEN2k users and developers,
>
> I encountered a very strange problem. Sometimes (50/50 chance), the calculations using just k-parallel will not finish. This exact same case, when submitted again (it sometime takes more tries) finishes with no problem. Sometimes it crashes after a few iterations, sometimes after a hundred or more, and sometimes it just finishes successfully.
>
> This is what I get in the output:
>
> sed: can't read /tmp/.tmp2.mpolak.50255: No such file or directory
> cp: cannot stat `.in.tmp': No such file or directory
>
>
> There is also an error in stderr:
>
> forrtl: No such file or directory
> forrtl: severe (29): file not found, unit 20, file /lustre/scratch/tmp/pbs.1275300.achilles/MoS2_LDA/fort.20
> Image PC Routine Line Source
> lapw2 00000000004B3E37 Unknown Unknown Unknown
> lapw2 00000000004D5BE0 Unknown Unknown Unknown
> lapw2 000000000048140D MAIN__ 155 lapw2_tmp_.F
> lapw2 0000000000403F0E Unknown Unknown Unknown
> libc.so.6 00002B1F2CD53D5D Unknown Unknown Unknown
> lapw2 0000000000403E19 Unknown Unknown Unknown
>
>
> Do you have any idea what may be the cause of this? Running on just one CPU is always fine. There is certainly no error in my input file, because after a few tries this exact same case will eventually finish correctly.
>
> Thank you for your help
>
> Maciej Polak
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/staff/tc_group_e.php
--------------------------------------------------------------------------


More information about the Wien mailing list