[Wien] LAPW2 crashed!

Peter Blaha pblaha at theochem.tuwien.ac.at
Thu Aug 16 17:29:10 CEST 2012


There as many errors in this script (maybe I overlooked some others).

> #!/bin/bash
>
> #SBATCH -A matter4
> #SBATCH -J tst
> #SBATCH -N 4
> #SBATCH -t 00:14:00
>
>
> export SCRATCH=/scratch/local
> export WIENROOT=/home/x_yunli/wien2k
>
> # set .machines for parallel job
> # lapw0 running on one node
> echo -n "lapw0: " > .machines
> echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
> echo "$i:8" >> .machines
> # run one mpi job on each node (splitting k-mesh over nodes)
> for i in $(hostlist -e $SLURM_JOB_NODELIST)
> do
>   echo "1:$i:8 " >> .machines
> done
> echo granularity:1 >> .machines
> echo extrafine:1   >> .machines
>
> #start WIEN2k
> x_lapw -f GaNCu -up -c -p -fermi

Whats that ????   You do not specify any program which should be executed by x_lapw ????


> #initio
> init_lapw -sp -red 3 -ecut 8 -numk 144

-ecut 8   ???  Do you know what  -ecut is ??  you specified a positive ( + 8 Ry) energy
to separate core from valence.  Probably you mean   -8 ???  Do you knwo why you want to use
-8  ??? (the default is -6).

> #main
> runsp_lapw -ec 0.0001Ry -i 40 -p -I

You are trying this in mpi-parallel mode.
Do you know when this is usefull ??
How many atoms do you have in your cell ??

It is ok to look at the dayfile, but there aremany other files you ahve to examine.

The output (+error) of your batch-job (maybe it is called tst.o* and tst.e*)

lse   list all error files (and look at the content of the non-zero files)
lso   lists all output files. Check them.

>
> - The program stops at this point,  This is the content of the day file:
> Calculating GaNCu in /home/x_yunli/WIEN2k/GaNCu
> on m371 with PID 18960
> using WIEN2k_11.1 (Release 14/6/2011) in /home/x_yunli/wien2k
>
>
>      start       (Thu Aug 16 13:56:39 CEST 2012) with lapw0 (40/99 to go)
>
>      cycle 1     (Thu Aug 16 13:56:39 CEST 2012)         (40/99 to go)
>
>>    lapw0 -p    (13:56:39) starting parallel lapw0 at Thu Aug 16 13:56:39 CEST 2012
> -------- .machine0 : 8 processors
> mpprun INFO: Starting openmpi run on 4 nodes (32 ranks)...
> 0.364u 0.590s 0:24.04 3.9%      0+0k 0+0io 27pf+0w
>>    lapw1  -c -up -p    (13:57:03) starting parallel lapw1 at Thu Aug 16 13:57:03 CEST 2012
> ->  starting parallel LAPW1 jobs at Thu Aug 16 13:57:03 CEST 2012
> running LAPW1 in parallel mode (using .machines)
> 4 number_of_parallel_jobs
>       m371 m371 m371 m371 m371 m371 m371 m371(18) 0.010u 0.006s 0:00.02 50.0%    0+0k 0+0io 1pf+0w
>       m372 m372 m372 m372 m372 m372 m372 m372(18) 0.010u 0.005s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>       m373 m373 m373 m373 m373 m373 m373 m373(18) 0.012u 0.004s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>       m374 m374 m374 m374 m374 m374 m374 m374(18) 0.011u 0.006s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>     Summary of lapw1para:
>     m371  k=0     user=0  wallclock=0
>     m372  k=0     user=0  wallclock=0
>     m373  k=0     user=0  wallclock=0
>     m374  k=0     user=0  wallclock=0
> 0.161u 0.239s 0:06.53 5.9%      0+0k 0+0io 11pf+0w
>>    lapw1  -c -dn -p    (13:57:09) starting parallel lapw1 at Thu Aug 16 13:57:09 CEST 2012
> ->  starting parallel LAPW1 jobs at Thu Aug 16 13:57:09 CEST 2012
> running LAPW1 in parallel mode (using .machines.help)
> 4 number_of_parallel_jobs
>       m371 m371 m371 m371 m371 m371 m371 m371(18) 0.011u 0.005s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>       m372 m372 m372 m372 m372 m372 m372 m372(18) 0.009u 0.008s 0:00.02 0.0%     0+0k 0+0io 0pf+0w
>       m373 m373 m373 m373 m373 m373 m373 m373(18) 0.009u 0.006s 0:00.01 0.0%     0+0k 0+0io 0pf+0w
>       m374 m374 m374 m374 m374 m374 m374 m374(18) 0.008u 0.007s 0:00.01 0.0%     0+0k 0+0io 0pf+0w
>     Summary of lapw1para:
>     m371  k=0     user=0  wallclock=0
>     m372  k=0     user=0  wallclock=0
>     m373  k=0     user=0  wallclock=0
>     m374  k=0     user=0  wallclock=0
> 0.138u 0.253s 0:06.39 5.9%      0+0k 0+0io 0pf+0w
>>    lapw2 -c -up  -p    (13:57:16) running LAPW2 in parallel mode
> **  LAPW2 crashed!
> 0.027u 0.039s 0:00.15 33.3%     0+0k 0+0io 0pf+0w
> error: command   /home/x_yunli/wien2k/lapw2cpara -up -c uplapw2.def   failed
>
>>    stop error
>
> Could you please find the problem ?
>
> Best regards,
> Li
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>

-- 
-----------------------------------------
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-----------------------------------------


More information about the Wien mailing list