[Wien] LAPW2 crashed!

Yunguo Li yunguo at kth.se
Thu Aug 16 18:35:09 CEST 2012


Dear Peter,
Thanks for your kind reply. i am sorry. I am a new user and no one has experience in my group.
I have 32 atoms in my system. Now I initiate the input using init_lapw command by hand, and choose separation energy to be -8 Ry.  The initialization is successful. 
I have changed the script to be :
#!/bin/bash
#cd /home/x_yunli/WIEN2k/GaNCu

#SBATCH -A matter4
#SBATCH -J tst
#SBATCH -N 4
#SBATCH -t 00:14:00


export SCRATCH=/scratch/local
export WIENROOT=/home/x_yunli/wien2k

# set .machines for parallel job
# lapw0 running on one node
echo -n "lapw0: " > .machines
echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
echo "$i:8" >> .machines
# run one mpi job on each node (splitting k-mesh over nodes)
for i in $(hostlist -e $SLURM_JOB_NODELIST)
do
 echo "1:$i:8 " >> .machines
done
echo granularity:1 >> .machines
echo extrafine:1   >> .machines

#start WIEN2k
#x_lapw
#initio
#init_lapw

#main
runsp_lapw -ec 0.0001Ry -i 40 -p -I

I still got the same error. the day file is same. There is only one error file containing info:
uplapw2.error
      **  Error in Parallel LAPW2
      **  testerror: Error in Parallel LAPW2

I did't get your meaning by "You are trying this in mpi-parallel mode. Do you know when this is usefull ?? "
Is there some problem with the parallel for LAPW2 ? 

Best regards,
Li












On Aug 16, 2012, at 5:29 PM, Peter Blaha wrote:

> There as many errors in this script (maybe I overlooked some others).
> 
>> #!/bin/bash
>> 
>> #SBATCH -A matter4
>> #SBATCH -J tst
>> #SBATCH -N 4
>> #SBATCH -t 00:14:00
>> 
>> 
>> export SCRATCH=/scratch/local
>> export WIENROOT=/home/x_yunli/wien2k
>> 
>> # set .machines for parallel job
>> # lapw0 running on one node
>> echo -n "lapw0: " > .machines
>> echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
>> echo "$i:8" >> .machines
>> # run one mpi job on each node (splitting k-mesh over nodes)
>> for i in $(hostlist -e $SLURM_JOB_NODELIST)
>> do
>>  echo "1:$i:8 " >> .machines
>> done
>> echo granularity:1 >> .machines
>> echo extrafine:1   >> .machines
>> 
>> #start WIEN2k
>> x_lapw -f GaNCu -up -c -p -fermi
> 
> Whats that ????   You do not specify any program which should be executed by x_lapw ????
> 
> 
>> #initio
>> init_lapw -sp -red 3 -ecut 8 -numk 144
> 
> -ecut 8   ???  Do you know what  -ecut is ??  you specified a positive ( + 8 Ry) energy
> to separate core from valence.  Probably you mean   -8 ???  Do you knwo why you want to use
> -8  ??? (the default is -6).
> 
>> #main
>> runsp_lapw -ec 0.0001Ry -i 40 -p -I
> 
> You are trying this in mpi-parallel mode.
> Do you know when this is usefull ??
> How many atoms do you have in your cell ??
> 
> It is ok to look at the dayfile, but there aremany other files you ahve to examine.
> 
> The output (+error) of your batch-job (maybe it is called tst.o* and tst.e*)
> 
> lse   list all error files (and look at the content of the non-zero files)
> lso   lists all output files. Check them.
> 
>> 
>> - The program stops at this point,  This is the content of the day file:
>> Calculating GaNCu in /home/x_yunli/WIEN2k/GaNCu
>> on m371 with PID 18960
>> using WIEN2k_11.1 (Release 14/6/2011) in /home/x_yunli/wien2k
>> 
>> 
>>     start       (Thu Aug 16 13:56:39 CEST 2012) with lapw0 (40/99 to go)
>> 
>>     cycle 1     (Thu Aug 16 13:56:39 CEST 2012)         (40/99 to go)
>> 
>>>   lapw0 -p    (13:56:39) starting parallel lapw0 at Thu Aug 16 13:56:39 CEST 2012
>> -------- .machine0 : 8 processors
>> mpprun INFO: Starting openmpi run on 4 nodes (32 ranks)...
>> 0.364u 0.590s 0:24.04 3.9%      0+0k 0+0io 27pf+0w
>>>   lapw1  -c -up -p    (13:57:03) starting parallel lapw1 at Thu Aug 16 13:57:03 CEST 2012
>> ->  starting parallel LAPW1 jobs at Thu Aug 16 13:57:03 CEST 2012
>> running LAPW1 in parallel mode (using .machines)
>> 4 number_of_parallel_jobs
>>      m371 m371 m371 m371 m371 m371 m371 m371(18) 0.010u 0.006s 0:00.02 50.0%    0+0k 0+0io 1pf+0w
>>      m372 m372 m372 m372 m372 m372 m372 m372(18) 0.010u 0.005s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>>      m373 m373 m373 m373 m373 m373 m373 m373(18) 0.012u 0.004s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>>      m374 m374 m374 m374 m374 m374 m374 m374(18) 0.011u 0.006s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>>    Summary of lapw1para:
>>    m371  k=0     user=0  wallclock=0
>>    m372  k=0     user=0  wallclock=0
>>    m373  k=0     user=0  wallclock=0
>>    m374  k=0     user=0  wallclock=0
>> 0.161u 0.239s 0:06.53 5.9%      0+0k 0+0io 11pf+0w
>>>   lapw1  -c -dn -p    (13:57:09) starting parallel lapw1 at Thu Aug 16 13:57:09 CEST 2012
>> ->  starting parallel LAPW1 jobs at Thu Aug 16 13:57:09 CEST 2012
>> running LAPW1 in parallel mode (using .machines.help)
>> 4 number_of_parallel_jobs
>>      m371 m371 m371 m371 m371 m371 m371 m371(18) 0.011u 0.005s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>>      m372 m372 m372 m372 m372 m372 m372 m372(18) 0.009u 0.008s 0:00.02 0.0%     0+0k 0+0io 0pf+0w
>>      m373 m373 m373 m373 m373 m373 m373 m373(18) 0.009u 0.006s 0:00.01 0.0%     0+0k 0+0io 0pf+0w
>>      m374 m374 m374 m374 m374 m374 m374 m374(18) 0.008u 0.007s 0:00.01 0.0%     0+0k 0+0io 0pf+0w
>>    Summary of lapw1para:
>>    m371  k=0     user=0  wallclock=0
>>    m372  k=0     user=0  wallclock=0
>>    m373  k=0     user=0  wallclock=0
>>    m374  k=0     user=0  wallclock=0
>> 0.138u 0.253s 0:06.39 5.9%      0+0k 0+0io 0pf+0w
>>>   lapw2 -c -up  -p    (13:57:16) running LAPW2 in parallel mode
>> **  LAPW2 crashed!
>> 0.027u 0.039s 0:00.15 33.3%     0+0k 0+0io 0pf+0w
>> error: command   /home/x_yunli/wien2k/lapw2cpara -up -c uplapw2.def   failed
>> 
>>>   stop error
>> 
>> Could you please find the problem ?
>> 
>> Best regards,
>> Li
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> 
> 
> -- 
> -----------------------------------------
> Peter Blaha
> Inst. Materials Chemistry, TU Vienna
> Getreidemarkt 9, A-1060 Vienna, Austria
> Tel: +43-1-5880115671
> Fax: +43-1-5880115698
> email: pblaha at theochem.tuwien.ac.at
> -----------------------------------------
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien



More information about the Wien mailing list