[Wien] LAPW2 crashed!

Yunguo Li yunguo at kth.se
Thu Aug 16 16:52:12 CEST 2012


Dear Laurence,
Thanks for your kind reply.
I have done the example and this system by web-wien2k. But the initialization always took login node, and some initialization steps take long time while I can not track.
The sbatch script is written by myself according to the direction of my support of cluster. I think maybe the problem is from bad initialization. The file uplapw2.def may have some problem, but I can not find the detail of this file in user guide.
I accept your suggestion and will try to redo initialization by hand this time.
best regards,
Li
On Aug 16, 2012, at 3:04 PM, Laurence Marks wrote:


Most errors are due to user mistakes in the input. You have not provided enough information for anyone to do more than make a guess.

My suspicion is that someone gave you the script and said "use this". If you are an experienced user scripts are good. However, most experienced users know where to look to diagnose errors.

You probably should do the initialization by hand so you can understand the steps and work out what has gone wrong. Have you started by working through the examples in the user guide first?

---------------------------
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu<http://www.numis.northwestern.edu/> 1-847-491-3996
"Research is to see what everybody else has seen, and to think what nobody else has thought"
Albert Szent-Gyorgi

On Aug 16, 2012 7:47 AM, "Yunguo Li" <yunguo at kth.se<mailto:yunguo at kth.se>> wrote:
Dear support,
- I am running wien version WIEN2k_11.1 (Release 14/6/2011).
- The purpose of my calculations is to calculate XAS, firstly I am running SCF, I considered ferromagnetic calculation.
- I am running this case using this sbatch script:
#!/bin/bash

#SBATCH -A matter4
#SBATCH -J tst
#SBATCH -N 4
#SBATCH -t 00:14:00


export SCRATCH=/scratch/local
export WIENROOT=/home/x_yunli/wien2k

# set .machines for parallel job
# lapw0 running on one node
echo -n "lapw0: " > .machines
echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
echo "$i:8" >> .machines
# run one mpi job on each node (splitting k-mesh over nodes)
for i in $(hostlist -e $SLURM_JOB_NODELIST)
do
 echo "1:$i:8 " >> .machines
done
echo granularity:1 >> .machines
echo extrafine:1   >> .machines

#start WIEN2k
x_lapw -f GaNCu -up -c -p -fermi
#initio
init_lapw -sp -red 3 -ecut 8 -numk 144
#main
runsp_lapw -ec 0.0001Ry -i 40 -p -I

- The program stops at this point,  This is the content of the day file:
Calculating GaNCu in /home/x_yunli/WIEN2k/GaNCu
on m371 with PID 18960
using WIEN2k_11.1 (Release 14/6/2011) in /home/x_yunli/wien2k


    start       (Thu Aug 16 13:56:39 CEST 2012) with lapw0 (40/99 to go)

    cycle 1     (Thu Aug 16 13:56:39 CEST 2012)         (40/99 to go)

>   lapw0 -p    (13:56:39) starting parallel lapw0 at Thu Aug 16 13:56:39 CEST 2012
-------- .machine0 : 8 processors
mpprun INFO: Starting openmpi run on 4 nodes (32 ranks)...
0.364u 0.590s 0:24.04 3.9%      0+0k 0+0io 27pf+0w
>   lapw1  -c -up -p    (13:57:03) starting parallel lapw1 at Thu Aug 16 13:57:03 CEST 2012
->  starting parallel LAPW1 jobs at Thu Aug 16 13:57:03 CEST 2012
running LAPW1 in parallel mode (using .machines)
4 number_of_parallel_jobs
     m371 m371 m371 m371 m371 m371 m371 m371(18) 0.010u 0.006s 0:00.02 50.0%    0+0k 0+0io 1pf+0w
     m372 m372 m372 m372 m372 m372 m372 m372(18) 0.010u 0.005s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
     m373 m373 m373 m373 m373 m373 m373 m373(18) 0.012u 0.004s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
     m374 m374 m374 m374 m374 m374 m374 m374(18) 0.011u 0.006s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
   Summary of lapw1para:
   m371  k=0     user=0  wallclock=0
   m372  k=0     user=0  wallclock=0
   m373  k=0     user=0  wallclock=0
   m374  k=0     user=0  wallclock=0
0.161u 0.239s 0:06.53 5.9%      0+0k 0+0io 11pf+0w
>   lapw1  -c -dn -p    (13:57:09) starting parallel lapw1 at Thu Aug 16 13:57:09 CEST 2012
->  starting parallel LAPW1 jobs at Thu Aug 16 13:57:09 CEST 2012
running LAPW1 in parallel mode (using .machines.help)
4 number_of_parallel_jobs
     m371 m371 m371 m371 m371 m371 m371 m371(18) 0.011u 0.005s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
     m372 m372 m372 m372 m372 m372 m372 m372(18) 0.009u 0.008s 0:00.02 0.0%     0+0k 0+0io 0pf+0w
     m373 m373 m373 m373 m373 m373 m373 m373(18) 0.009u 0.006s 0:00.01 0.0%     0+0k 0+0io 0pf+0w
     m374 m374 m374 m374 m374 m374 m374 m374(18) 0.008u 0.007s 0:00.01 0.0%     0+0k 0+0io 0pf+0w
   Summary of lapw1para:
   m371  k=0     user=0  wallclock=0
   m372  k=0     user=0  wallclock=0
   m373  k=0     user=0  wallclock=0
   m374  k=0     user=0  wallclock=0
0.138u 0.253s 0:06.39 5.9%      0+0k 0+0io 0pf+0w
>   lapw2 -c -up  -p    (13:57:16) running LAPW2 in parallel mode
**  LAPW2 crashed!
0.027u 0.039s 0:00.15 33.3%     0+0k 0+0io 0pf+0w
error: command   /home/x_yunli/wien2k/lapw2cpara -up -c uplapw2.def   failed

>   stop error

Could you please find the problem ?

Best regards,
Li
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at<mailto:Wien at zeus.theochem.tuwien.ac.at>
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at<mailto:Wien at zeus.theochem.tuwien.ac.at>
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120816/20bb9f6c/attachment.htm>


More information about the Wien mailing list