[Wien] Wien2k stopped working
Luis Ogando
lcodacal at gmail.com
Wed May 28 15:23:43 CEST 2014
Dear Wien2k community,
I have Wien2k 13.1 installed in a SGI cluster using ifort, icc and Open
MPI. The installation was a hard work (I would like to thank again the help
from Prof. Lawrence Marks), but after all I have used Wien2k without
problems for several months.
I performed the first step of a long calculation and saved it in a
different directory. When I tried the next step in the original directory,
Wien2k crashed. After some tests, I decided to reinitialize the calculation
from the beginning (in other words, to repeat the first step). To my
surprise, I did not succeed even in this case and I would like to know if
someone has faced such an unexpected problem.
Please, find below some of the output files that I consider the most
relevant ones.
Finally, I would like to stress some points:
1) lapw0 stops after more or less 7 minutes, but it took about 2 hours in
the successful calculation.
2) lapw1 stops after 5 seconds without generating the case.energy_* files
and case.dayfile does not contain the time statistic for each processor.
3) OMP_NUM_THREADS=12 is overwritten by the system (in my .bashrc I
have OMP_NUM_THREADS=1), but even when I export this variable equal to 1 in
the submission script, I get the same crash.
Thank you very much for your attention,
Luis
===========================================================
:log file
> (init_lapw) options:
Wed Apr 2 14:07:30 BRT 2014> (x_lapw) nn -f InPzb15InPwurt3-V2
Wed Apr 2 14:07:46 BRT 2014> (x) nn
Wed Apr 2 14:08:03 BRT 2014> (x) sgroup
Wed Apr 2 14:08:23 BRT 2014> (x) symmetry
Wed Apr 2 14:08:48 BRT 2014> (x) lstart
Wed Apr 2 14:09:38 BRT 2014> (x) kgen
Wed Apr 2 14:09:58 BRT 2014> (x) dstart -c -p
> (initso_lapw) options:
Tue May 27 16:07:00 BRT 2014> (x) Machines2W
> (run_lapw) options: -p -NI -ec 0.0001 -cc 0.0001 -i 150 -it
Tue May 27 16:07:00 BRT 2014> (x) lapw0 -p
Tue May 27 16:14:10 BRT 2014> (x) lapw1 -it -p -c
Tue May 27 16:14:15 BRT 2014> (x) lapw2 -p -c
===========================================================
case.dayfile
Calculating InPzb15InPwurt3-V2 in
/home/ice/proj/proj546/ogando/Wien/Calculos/InP/InPzbInPwurt/15camadasZB+3WZ/InPzb15InPwurt3-V2
on r1i0n15 with PID 6538
using WIEN2k_13.1 (Release 17/6/2013) in
/home/ice/proj/proj546/ogando/Wien/Executaveis-13-OpenMPI
start (Tue May 27 16:07:00 BRT 2014) with lapw0 (150/99 to go)
cycle 1 (Tue May 27 16:07:00 BRT 2014) (150/99 to go)
> lapw0 -p (16:07:00) starting parallel lapw0 at Tue May 27 16:07:00 BRT
2014
-------- .machine0 : 12 processors
2540.314u 12.204s 7:09.36 594.4% 0+0k 180672+52736io 5pf+0w
> lapw1 -it -p -c (16:14:10) starting parallel lapw1 at Tue May 27
16:14:10 BRT 2014
-> starting parallel LAPW1 jobs at Tue May 27 16:14:10 BRT 2014
running LAPW1 in parallel mode (using .machines)
12 number_of_parallel_jobs
r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1)
r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1)
r1i0n15(1) r1i0n15(1) r1i0n15(1) Summary of lapw1para:
r1i0n15 k=1 user=0 wallclock=1
0.132u 0.136s 0:04.75 5.4% 0+0k 4104+1688io 5pf+0w
> lapw2 -p -c (16:14:15) running LAPW2 in parallel mode
** LAPW2 crashed!
0.396u 0.016s 0:00.66 60.6% 0+0k 6424+11472io 1pf+0w
error: command
/home/ice/proj/proj546/ogando/Wien/Executaveis-13-OpenMPI/lapw2cpara -c
lapw2.def failed
> stop error
===========================================================
lapw2.error (the only non empty case.error)
Error in LAPW2
'LAPW2' - can't open unit: 30
'LAPW2' - filename: InPzb15InPwurt3-V2.energy_1
** testerror: Error in Parallel LAPW2
===========================================================
The standard output file
OMP_NUM_THREADS = 12
-----------------------------------------
Inicio do job: Tue May 27 16:07:00 BRT 2014
Hostname: r1i0n15
PWD:
/home/ice/proj/proj546/ogando/Wien/Calculos/InP/InPzbInPwurt/15camadasZB+3WZ/InPzb15InPwurt3-V2
0.000u 0.000s 0:00.05 0.0% 0+0k 8216+24io 1pf+0w
LAPW0 END
LAPW0 END
LAPW0 END
LAPW0 END
LAPW0 END
LAPW0 END
LAPW0 END
LAPW0 END
LAPW0 END
LAPW0 END
LAPW0 END
LAPW0 END
grep: .processes: No such file or directory
InPzb15InPwurt3-V2.scf1_1: No such file or directory.
grep: No match.
FERMI - Error
cp: cannot stat `.in.tmp': No such file or directory
> stop error
Final do job: Tue May 27 16:14:15 BRT 2014
-----------------------------------------
OMP_NUM_THREADS = 12
=======================================
My parallel_options file
setenv TASKSET "no"
setenv USE_REMOTE 1
setenv MPI_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "/home/ice/proj/proj546/ogando/OpenMPIexec/bin/mpirun
-np _NP_ -machinefile _HOSTS_ _EXEC_"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20140528/f4c04827/attachment.htm>
More information about the Wien
mailing list