[Wien] LAPW2 crashed!

Gavin Abo gsabo at crimson.ua.edu
Thu Aug 16 18:45:20 CEST 2012


Is "-ec 0.0001Ry" valid input?  Think it should be "-ec 0.0001".

On 8/16/2012 10:42 AM, Laurence Marks wrote:
> Attach your GaNCu.struct file to an email -- often there will be an
> obvious problem for someone with more experience.
>
> On Thu, Aug 16, 2012 at 11:35 AM, Yunguo Li <yunguo at kth.se> wrote:
>> Dear Peter,
>> Thanks for your kind reply. i am sorry. I am a new user and no one has experience in my group.
>> I have 32 atoms in my system. Now I initiate the input using init_lapw command by hand, and choose separation energy to be -8 Ry.  The initialization is successful.
>> I have changed the script to be :
>> #!/bin/bash
>> #cd /home/x_yunli/WIEN2k/GaNCu
>>
>> #SBATCH -A matter4
>> #SBATCH -J tst
>> #SBATCH -N 4
>> #SBATCH -t 00:14:00
>>
>>
>> export SCRATCH=/scratch/local
>> export WIENROOT=/home/x_yunli/wien2k
>>
>> # set .machines for parallel job
>> # lapw0 running on one node
>> echo -n "lapw0: " > .machines
>> echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
>> echo "$i:8" >> .machines
>> # run one mpi job on each node (splitting k-mesh over nodes)
>> for i in $(hostlist -e $SLURM_JOB_NODELIST)
>> do
>>   echo "1:$i:8 " >> .machines
>> done
>> echo granularity:1 >> .machines
>> echo extrafine:1   >> .machines
>>
>> #start WIEN2k
>> #x_lapw
>> #initio
>> #init_lapw
>>
>> #main
>> runsp_lapw -ec 0.0001Ry -i 40 -p -I
>>
>> I still got the same error. the day file is same. There is only one error file containing info:
>> uplapw2.error
>>        **  Error in Parallel LAPW2
>>        **  testerror: Error in Parallel LAPW2
>>
>> I did't get your meaning by "You are trying this in mpi-parallel mode. Do you know when this is usefull ?? "
>> Is there some problem with the parallel for LAPW2 ?
>>
>> Best regards,
>> Li
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Aug 16, 2012, at 5:29 PM, Peter Blaha wrote:
>>
>>> There as many errors in this script (maybe I overlooked some others).
>>>
>>>> #!/bin/bash
>>>>
>>>> #SBATCH -A matter4
>>>> #SBATCH -J tst
>>>> #SBATCH -N 4
>>>> #SBATCH -t 00:14:00
>>>>
>>>>
>>>> export SCRATCH=/scratch/local
>>>> export WIENROOT=/home/x_yunli/wien2k
>>>>
>>>> # set .machines for parallel job
>>>> # lapw0 running on one node
>>>> echo -n "lapw0: " > .machines
>>>> echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
>>>> echo "$i:8" >> .machines
>>>> # run one mpi job on each node (splitting k-mesh over nodes)
>>>> for i in $(hostlist -e $SLURM_JOB_NODELIST)
>>>> do
>>>>   echo "1:$i:8 " >> .machines
>>>> done
>>>> echo granularity:1 >> .machines
>>>> echo extrafine:1   >> .machines
>>>>
>>>> #start WIEN2k
>>>> x_lapw -f GaNCu -up -c -p -fermi
>>> Whats that ????   You do not specify any program which should be executed by x_lapw ????
>>>
>>>
>>>> #initio
>>>> init_lapw -sp -red 3 -ecut 8 -numk 144
>>> -ecut 8   ???  Do you know what  -ecut is ??  you specified a positive ( + 8 Ry) energy
>>> to separate core from valence.  Probably you mean   -8 ???  Do you knwo why you want to use
>>> -8  ??? (the default is -6).
>>>
>>>> #main
>>>> runsp_lapw -ec 0.0001Ry -i 40 -p -I
>>> You are trying this in mpi-parallel mode.
>>> Do you know when this is usefull ??
>>> How many atoms do you have in your cell ??
>>>
>>> It is ok to look at the dayfile, but there aremany other files you ahve to examine.
>>>
>>> The output (+error) of your batch-job (maybe it is called tst.o* and tst.e*)
>>>
>>> lse   list all error files (and look at the content of the non-zero files)
>>> lso   lists all output files. Check them.
>>>
>>>> - The program stops at this point,  This is the content of the day file:
>>>> Calculating GaNCu in /home/x_yunli/WIEN2k/GaNCu
>>>> on m371 with PID 18960
>>>> using WIEN2k_11.1 (Release 14/6/2011) in /home/x_yunli/wien2k
>>>>
>>>>
>>>>      start       (Thu Aug 16 13:56:39 CEST 2012) with lapw0 (40/99 to go)
>>>>
>>>>      cycle 1     (Thu Aug 16 13:56:39 CEST 2012)         (40/99 to go)
>>>>
>>>>>    lapw0 -p    (13:56:39) starting parallel lapw0 at Thu Aug 16 13:56:39 CEST 2012
>>>> -------- .machine0 : 8 processors
>>>> mpprun INFO: Starting openmpi run on 4 nodes (32 ranks)...
>>>> 0.364u 0.590s 0:24.04 3.9%      0+0k 0+0io 27pf+0w
>>>>>    lapw1  -c -up -p    (13:57:03) starting parallel lapw1 at Thu Aug 16 13:57:03 CEST 2012
>>>> ->  starting parallel LAPW1 jobs at Thu Aug 16 13:57:03 CEST 2012
>>>> running LAPW1 in parallel mode (using .machines)
>>>> 4 number_of_parallel_jobs
>>>>       m371 m371 m371 m371 m371 m371 m371 m371(18) 0.010u 0.006s 0:00.02 50.0%    0+0k 0+0io 1pf+0w
>>>>       m372 m372 m372 m372 m372 m372 m372 m372(18) 0.010u 0.005s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>>>>       m373 m373 m373 m373 m373 m373 m373 m373(18) 0.012u 0.004s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>>>>       m374 m374 m374 m374 m374 m374 m374 m374(18) 0.011u 0.006s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>>>>     Summary of lapw1para:
>>>>     m371  k=0     user=0  wallclock=0
>>>>     m372  k=0     user=0  wallclock=0
>>>>     m373  k=0     user=0  wallclock=0
>>>>     m374  k=0     user=0  wallclock=0
>>>> 0.161u 0.239s 0:06.53 5.9%      0+0k 0+0io 11pf+0w
>>>>>    lapw1  -c -dn -p    (13:57:09) starting parallel lapw1 at Thu Aug 16 13:57:09 CEST 2012
>>>> ->  starting parallel LAPW1 jobs at Thu Aug 16 13:57:09 CEST 2012
>>>> running LAPW1 in parallel mode (using .machines.help)
>>>> 4 number_of_parallel_jobs
>>>>       m371 m371 m371 m371 m371 m371 m371 m371(18) 0.011u 0.005s 0:00.01 100.0%   0+0k 0+0io 0pf+0w
>>>>       m372 m372 m372 m372 m372 m372 m372 m372(18) 0.009u 0.008s 0:00.02 0.0%     0+0k 0+0io 0pf+0w
>>>>       m373 m373 m373 m373 m373 m373 m373 m373(18) 0.009u 0.006s 0:00.01 0.0%     0+0k 0+0io 0pf+0w
>>>>       m374 m374 m374 m374 m374 m374 m374 m374(18) 0.008u 0.007s 0:00.01 0.0%     0+0k 0+0io 0pf+0w
>>>>     Summary of lapw1para:
>>>>     m371  k=0     user=0  wallclock=0
>>>>     m372  k=0     user=0  wallclock=0
>>>>     m373  k=0     user=0  wallclock=0
>>>>     m374  k=0     user=0  wallclock=0
>>>> 0.138u 0.253s 0:06.39 5.9%      0+0k 0+0io 0pf+0w
>>>>>    lapw2 -c -up  -p    (13:57:16) running LAPW2 in parallel mode
>>>> **  LAPW2 crashed!
>>>> 0.027u 0.039s 0:00.15 33.3%     0+0k 0+0io 0pf+0w
>>>> error: command   /home/x_yunli/wien2k/lapw2cpara -up -c uplapw2.def   failed
>>>>
>>>>>    stop error
>>>> Could you please find the problem ?
>>>>
>>>> Best regards,
>>>> Li
>>>> _______________________________________________
>>>> Wien mailing list
>>>> Wien at zeus.theochem.tuwien.ac.at
>>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>>>
>>> --
>>> -----------------------------------------
>>> Peter Blaha
>>> Inst. Materials Chemistry, TU Vienna
>>> Getreidemarkt 9, A-1060 Vienna, Austria
>>> Tel: +43-1-5880115671
>>> Fax: +43-1-5880115698
>>> email: pblaha at theochem.tuwien.ac.at
>>> -----------------------------------------
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>



More information about the Wien mailing list