[Wien] Wien2k stopped working

Luis Ogando lcodacal at gmail.com
Wed May 28 15:23:43 CEST 2014


Dear Wien2k community,

   I have Wien2k 13.1 installed in a SGI cluster using ifort, icc and Open
MPI. The installation was a hard work (I would like to thank again the help
from Prof. Lawrence Marks), but after all I have used Wien2k without
problems for several months.
   I performed the first step of a long calculation and saved it in a
different directory. When I tried the next step in the original directory,
Wien2k crashed. After some tests, I decided to reinitialize the calculation
from the beginning (in other words, to repeat the first step). To my
surprise, I did not succeed even in this case and I would like to know if
someone has faced such an unexpected problem.
   Please, find below some of the output files that I consider the most
relevant ones.
   Finally, I would like to stress some points:

1) lapw0 stops after more or less 7 minutes, but it took about 2 hours in
the successful calculation.

2) lapw1 stops after 5 seconds without generating the case.energy_* files
and case.dayfile does not contain the time statistic for each processor.

3) OMP_NUM_THREADS=12 is overwritten by the system (in my .bashrc I
have OMP_NUM_THREADS=1), but even when I export this variable equal to 1 in
the submission script, I get the same crash.

   Thank you very much for your attention,
              Luis
===========================================================
:log file

>   (init_lapw) options:
Wed Apr  2 14:07:30 BRT 2014> (x_lapw) nn -f InPzb15InPwurt3-V2
Wed Apr  2 14:07:46 BRT 2014> (x) nn
Wed Apr  2 14:08:03 BRT 2014> (x) sgroup
Wed Apr  2 14:08:23 BRT 2014> (x) symmetry
Wed Apr  2 14:08:48 BRT 2014> (x) lstart
Wed Apr  2 14:09:38 BRT 2014> (x) kgen
Wed Apr  2 14:09:58 BRT 2014> (x) dstart -c -p
>   (initso_lapw) options:
Tue May 27 16:07:00 BRT 2014> (x) Machines2W
>   (run_lapw) options: -p -NI -ec 0.0001 -cc 0.0001 -i 150 -it
Tue May 27 16:07:00 BRT 2014> (x) lapw0 -p
Tue May 27 16:14:10 BRT 2014> (x) lapw1 -it -p -c
Tue May 27 16:14:15 BRT 2014> (x) lapw2 -p -c

===========================================================
case.dayfile

Calculating InPzb15InPwurt3-V2 in
/home/ice/proj/proj546/ogando/Wien/Calculos/InP/InPzbInPwurt/15camadasZB+3WZ/InPzb15InPwurt3-V2
on r1i0n15 with PID 6538
using WIEN2k_13.1 (Release 17/6/2013) in
/home/ice/proj/proj546/ogando/Wien/Executaveis-13-OpenMPI


    start (Tue May 27 16:07:00 BRT 2014) with lapw0 (150/99 to go)

    cycle 1 (Tue May 27 16:07:00 BRT 2014) (150/99 to go)

>   lapw0 -p (16:07:00) starting parallel lapw0 at Tue May 27 16:07:00 BRT
2014
-------- .machine0 : 12 processors
2540.314u 12.204s 7:09.36 594.4% 0+0k 180672+52736io 5pf+0w
>   lapw1 -it -p   -c (16:14:10) starting parallel lapw1 at Tue May 27
16:14:10 BRT 2014
->  starting parallel LAPW1 jobs at Tue May 27 16:14:10 BRT 2014
running LAPW1 in parallel mode (using .machines)
12 number_of_parallel_jobs
     r1i0n15(1)      r1i0n15(1)      r1i0n15(1)      r1i0n15(1)
 r1i0n15(1)      r1i0n15(1)      r1i0n15(1)      r1i0n15(1)      r1i0n15(1)
     r1i0n15(1)      r1i0n15(1)      r1i0n15(1)    Summary of lapw1para:
   r1i0n15 k=1 user=0 wallclock=1
0.132u 0.136s 0:04.75 5.4% 0+0k 4104+1688io 5pf+0w
>   lapw2 -p   -c   (16:14:15) running LAPW2 in parallel mode
**  LAPW2 crashed!
0.396u 0.016s 0:00.66 60.6% 0+0k 6424+11472io 1pf+0w
error: command
/home/ice/proj/proj546/ogando/Wien/Executaveis-13-OpenMPI/lapw2cpara -c
lapw2.def   failed

>   stop error

===========================================================
lapw2.error (the only non empty case.error)

Error in LAPW2
 'LAPW2' - can't open unit: 30

 'LAPW2' -        filename: InPzb15InPwurt3-V2.energy_1

**  testerror: Error in Parallel LAPW2

===========================================================
The standard output file


OMP_NUM_THREADS =  12

-----------------------------------------
Inicio do job: Tue May 27 16:07:00 BRT 2014
Hostname:  r1i0n15
PWD:
 /home/ice/proj/proj546/ogando/Wien/Calculos/InP/InPzbInPwurt/15camadasZB+3WZ/InPzb15InPwurt3-V2
0.000u 0.000s 0:00.05 0.0% 0+0k 8216+24io 1pf+0w
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
grep: .processes: No such file or directory
InPzb15InPwurt3-V2.scf1_1: No such file or directory.
grep: No match.
FERMI - Error
cp: cannot stat `.in.tmp': No such file or directory

>   stop error
Final do job: Tue May 27 16:14:15 BRT 2014
-----------------------------------------

OMP_NUM_THREADS =  12

=======================================
My parallel_options file

setenv TASKSET "no"
setenv USE_REMOTE 1
setenv MPI_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "/home/ice/proj/proj546/ogando/OpenMPIexec/bin/mpirun
-np _NP_ -machinefile _HOSTS_ _EXEC_"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20140528/f4c04827/attachment.htm>


More information about the Wien mailing list