[Wien] LAPW2 crashed!
Yunguo Li
yunguo at kth.se
Thu Aug 16 14:47:05 CEST 2012
Dear support,
- I am running wien version WIEN2k_11.1 (Release 14/6/2011).
- The purpose of my calculations is to calculate XAS, firstly I am running SCF, I considered ferromagnetic calculation.
- I am running this case using this sbatch script:
#!/bin/bash
#SBATCH -A matter4
#SBATCH -J tst
#SBATCH -N 4
#SBATCH -t 00:14:00
export SCRATCH=/scratch/local
export WIENROOT=/home/x_yunli/wien2k
# set .machines for parallel job
# lapw0 running on one node
echo -n "lapw0: " > .machines
echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
echo "$i:8" >> .machines
# run one mpi job on each node (splitting k-mesh over nodes)
for i in $(hostlist -e $SLURM_JOB_NODELIST)
do
echo "1:$i:8 " >> .machines
done
echo granularity:1 >> .machines
echo extrafine:1 >> .machines
#start WIEN2k
x_lapw -f GaNCu -up -c -p -fermi
#initio
init_lapw -sp -red 3 -ecut 8 -numk 144
#main
runsp_lapw -ec 0.0001Ry -i 40 -p -I
- The program stops at this point, This is the content of the day file:
Calculating GaNCu in /home/x_yunli/WIEN2k/GaNCu
on m371 with PID 18960
using WIEN2k_11.1 (Release 14/6/2011) in /home/x_yunli/wien2k
start (Thu Aug 16 13:56:39 CEST 2012) with lapw0 (40/99 to go)
cycle 1 (Thu Aug 16 13:56:39 CEST 2012) (40/99 to go)
> lapw0 -p (13:56:39) starting parallel lapw0 at Thu Aug 16 13:56:39 CEST 2012
-------- .machine0 : 8 processors
mpprun INFO: Starting openmpi run on 4 nodes (32 ranks)...
0.364u 0.590s 0:24.04 3.9% 0+0k 0+0io 27pf+0w
> lapw1 -c -up -p (13:57:03) starting parallel lapw1 at Thu Aug 16 13:57:03 CEST 2012
-> starting parallel LAPW1 jobs at Thu Aug 16 13:57:03 CEST 2012
running LAPW1 in parallel mode (using .machines)
4 number_of_parallel_jobs
m371 m371 m371 m371 m371 m371 m371 m371(18) 0.010u 0.006s 0:00.02 50.0% 0+0k 0+0io 1pf+0w
m372 m372 m372 m372 m372 m372 m372 m372(18) 0.010u 0.005s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
m373 m373 m373 m373 m373 m373 m373 m373(18) 0.012u 0.004s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
m374 m374 m374 m374 m374 m374 m374 m374(18) 0.011u 0.006s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
Summary of lapw1para:
m371 k=0 user=0 wallclock=0
m372 k=0 user=0 wallclock=0
m373 k=0 user=0 wallclock=0
m374 k=0 user=0 wallclock=0
0.161u 0.239s 0:06.53 5.9% 0+0k 0+0io 11pf+0w
> lapw1 -c -dn -p (13:57:09) starting parallel lapw1 at Thu Aug 16 13:57:09 CEST 2012
-> starting parallel LAPW1 jobs at Thu Aug 16 13:57:09 CEST 2012
running LAPW1 in parallel mode (using .machines.help)
4 number_of_parallel_jobs
m371 m371 m371 m371 m371 m371 m371 m371(18) 0.011u 0.005s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
m372 m372 m372 m372 m372 m372 m372 m372(18) 0.009u 0.008s 0:00.02 0.0% 0+0k 0+0io 0pf+0w
m373 m373 m373 m373 m373 m373 m373 m373(18) 0.009u 0.006s 0:00.01 0.0% 0+0k 0+0io 0pf+0w
m374 m374 m374 m374 m374 m374 m374 m374(18) 0.008u 0.007s 0:00.01 0.0% 0+0k 0+0io 0pf+0w
Summary of lapw1para:
m371 k=0 user=0 wallclock=0
m372 k=0 user=0 wallclock=0
m373 k=0 user=0 wallclock=0
m374 k=0 user=0 wallclock=0
0.138u 0.253s 0:06.39 5.9% 0+0k 0+0io 0pf+0w
> lapw2 -c -up -p (13:57:16) running LAPW2 in parallel mode
** LAPW2 crashed!
0.027u 0.039s 0:00.15 33.3% 0+0k 0+0io 0pf+0w
error: command /home/x_yunli/wien2k/lapw2cpara -up -c uplapw2.def failed
> stop error
Could you please find the problem ?
Best regards,
Li
More information about the Wien
mailing list