[Wien] LAPW2 crashed!

Yunguo Li yunguo at kth.se
Thu Aug 16 14:47:05 CEST 2012


Dear support,
- I am running wien version WIEN2k_11.1 (Release 14/6/2011).
- The purpose of my calculations is to calculate XAS, firstly I am running SCF, I considered ferromagnetic calculation.
- I am running this case using this sbatch script:
#!/bin/bash

#SBATCH -A matter4
#SBATCH -J tst
#SBATCH -N 4
#SBATCH -t 00:14:00


export SCRATCH=/scratch/local
export WIENROOT=/home/x_yunli/wien2k

# set .machines for parallel job
# lapw0 running on one node
echo -n "lapw0: " > .machines
echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
echo "$i:8" >> .machines
# run one mpi job on each node (splitting k-mesh over nodes)
for i in $(hostlist -e $SLURM_JOB_NODELIST)
do
 echo "1:$i:8 " >> .machines
done
echo granularity:1 >> .machines
echo extrafine:1   >> .machines

#start WIEN2k
x_lapw -f GaNCu -up -c -p -fermi 
#initio
init_lapw -sp -red 3 -ecut 8 -numk 144
#main
runsp_lapw -ec 0.0001Ry -i 40 -p -I

- The program stops at this point,  This is the content of the day file:
Calculating GaNCu in /home/x_yunli/WIEN2k/GaNCu
on m371 with PID 18960
using WIEN2k_11.1 (Release 14/6/2011) in /home/x_yunli/wien2k


    start       (Thu Aug 16 13:56:39 CEST 2012) with lapw0 (40/99 to go)

    cycle 1     (Thu Aug 16 13:56:39 CEST 2012)         (40/99 to go)

>   lapw0 -p    (13:56:39) starting parallel lapw0 at Thu Aug 16 13:56:39 CEST 2012
-------- .machine0 : 8 processors
mpprun INFO: Starting openmpi run on 4 nodes (32 ranks)...
0.364u 0.590s 0:24.04 3.9%      0+0k 0+0io 27pf+0w
>   lapw1  -c -up -p    (13:57:03) starting parallel lapw1 at Thu Aug 16 13:57:03 CEST 2012
->  starting parallel LAPW1 jobs at Thu Aug 16 13:57:03 CEST 2012
running LAPW1 in parallel mode (using .machines)
4 number_of_parallel_jobs
     m371 m371 m371 m371 m371 m371 m371 m371(18) 0.010u 0.006s 0:00.02 50.0%    0+0k 0+0io 1pf+0w
     m372 m372 m372 m372 m372 m372 m372 m372(18) 0.010u 0.005s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
     m373 m373 m373 m373 m373 m373 m373 m373(18) 0.012u 0.004s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
     m374 m374 m374 m374 m374 m374 m374 m374(18) 0.011u 0.006s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
   Summary of lapw1para:
   m371  k=0     user=0  wallclock=0
   m372  k=0     user=0  wallclock=0
   m373  k=0     user=0  wallclock=0
   m374  k=0     user=0  wallclock=0
0.161u 0.239s 0:06.53 5.9%      0+0k 0+0io 11pf+0w
>   lapw1  -c -dn -p    (13:57:09) starting parallel lapw1 at Thu Aug 16 13:57:09 CEST 2012
->  starting parallel LAPW1 jobs at Thu Aug 16 13:57:09 CEST 2012
running LAPW1 in parallel mode (using .machines.help)
4 number_of_parallel_jobs
     m371 m371 m371 m371 m371 m371 m371 m371(18) 0.011u 0.005s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
     m372 m372 m372 m372 m372 m372 m372 m372(18) 0.009u 0.008s 0:00.02 0.0%     0+0k 0+0io 0pf+0w
     m373 m373 m373 m373 m373 m373 m373 m373(18) 0.009u 0.006s 0:00.01 0.0%     0+0k 0+0io 0pf+0w
     m374 m374 m374 m374 m374 m374 m374 m374(18) 0.008u 0.007s 0:00.01 0.0%     0+0k 0+0io 0pf+0w
   Summary of lapw1para:
   m371  k=0     user=0  wallclock=0
   m372  k=0     user=0  wallclock=0
   m373  k=0     user=0  wallclock=0
   m374  k=0     user=0  wallclock=0
0.138u 0.253s 0:06.39 5.9%      0+0k 0+0io 0pf+0w
>   lapw2 -c -up  -p    (13:57:16) running LAPW2 in parallel mode
**  LAPW2 crashed!
0.027u 0.039s 0:00.15 33.3%     0+0k 0+0io 0pf+0w
error: command   /home/x_yunli/wien2k/lapw2cpara -up -c uplapw2.def   failed

>   stop error

Could you please find the problem ?

Best regards,
Li


More information about the Wien mailing list