[Wien] LAPW2 crashed!
Laurence Marks
L-marks at northwestern.edu
Thu Aug 16 18:42:05 CEST 2012
Attach your GaNCu.struct file to an email -- often there will be an
obvious problem for someone with more experience.
On Thu, Aug 16, 2012 at 11:35 AM, Yunguo Li <yunguo at kth.se> wrote:
> Dear Peter,
> Thanks for your kind reply. i am sorry. I am a new user and no one has experience in my group.
> I have 32 atoms in my system. Now I initiate the input using init_lapw command by hand, and choose separation energy to be -8 Ry. The initialization is successful.
> I have changed the script to be :
> #!/bin/bash
> #cd /home/x_yunli/WIEN2k/GaNCu
>
> #SBATCH -A matter4
> #SBATCH -J tst
> #SBATCH -N 4
> #SBATCH -t 00:14:00
>
>
> export SCRATCH=/scratch/local
> export WIENROOT=/home/x_yunli/wien2k
>
> # set .machines for parallel job
> # lapw0 running on one node
> echo -n "lapw0: " > .machines
> echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
> echo "$i:8" >> .machines
> # run one mpi job on each node (splitting k-mesh over nodes)
> for i in $(hostlist -e $SLURM_JOB_NODELIST)
> do
> echo "1:$i:8 " >> .machines
> done
> echo granularity:1 >> .machines
> echo extrafine:1 >> .machines
>
> #start WIEN2k
> #x_lapw
> #initio
> #init_lapw
>
> #main
> runsp_lapw -ec 0.0001Ry -i 40 -p -I
>
> I still got the same error. the day file is same. There is only one error file containing info:
> uplapw2.error
> ** Error in Parallel LAPW2
> ** testerror: Error in Parallel LAPW2
>
> I did't get your meaning by "You are trying this in mpi-parallel mode. Do you know when this is usefull ?? "
> Is there some problem with the parallel for LAPW2 ?
>
> Best regards,
> Li
>
>
>
>
>
>
>
>
>
>
>
>
> On Aug 16, 2012, at 5:29 PM, Peter Blaha wrote:
>
>> There as many errors in this script (maybe I overlooked some others).
>>
>>> #!/bin/bash
>>>
>>> #SBATCH -A matter4
>>> #SBATCH -J tst
>>> #SBATCH -N 4
>>> #SBATCH -t 00:14:00
>>>
>>>
>>> export SCRATCH=/scratch/local
>>> export WIENROOT=/home/x_yunli/wien2k
>>>
>>> # set .machines for parallel job
>>> # lapw0 running on one node
>>> echo -n "lapw0: " > .machines
>>> echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
>>> echo "$i:8" >> .machines
>>> # run one mpi job on each node (splitting k-mesh over nodes)
>>> for i in $(hostlist -e $SLURM_JOB_NODELIST)
>>> do
>>> echo "1:$i:8 " >> .machines
>>> done
>>> echo granularity:1 >> .machines
>>> echo extrafine:1 >> .machines
>>>
>>> #start WIEN2k
>>> x_lapw -f GaNCu -up -c -p -fermi
>>
>> Whats that ???? You do not specify any program which should be executed by x_lapw ????
>>
>>
>>> #initio
>>> init_lapw -sp -red 3 -ecut 8 -numk 144
>>
>> -ecut 8 ??? Do you know what -ecut is ?? you specified a positive ( + 8 Ry) energy
>> to separate core from valence. Probably you mean -8 ??? Do you knwo why you want to use
>> -8 ??? (the default is -6).
>>
>>> #main
>>> runsp_lapw -ec 0.0001Ry -i 40 -p -I
>>
>> You are trying this in mpi-parallel mode.
>> Do you know when this is usefull ??
>> How many atoms do you have in your cell ??
>>
>> It is ok to look at the dayfile, but there aremany other files you ahve to examine.
>>
>> The output (+error) of your batch-job (maybe it is called tst.o* and tst.e*)
>>
>> lse list all error files (and look at the content of the non-zero files)
>> lso lists all output files. Check them.
>>
>>>
>>> - The program stops at this point, This is the content of the day file:
>>> Calculating GaNCu in /home/x_yunli/WIEN2k/GaNCu
>>> on m371 with PID 18960
>>> using WIEN2k_11.1 (Release 14/6/2011) in /home/x_yunli/wien2k
>>>
>>>
>>> start (Thu Aug 16 13:56:39 CEST 2012) with lapw0 (40/99 to go)
>>>
>>> cycle 1 (Thu Aug 16 13:56:39 CEST 2012) (40/99 to go)
>>>
>>>> lapw0 -p (13:56:39) starting parallel lapw0 at Thu Aug 16 13:56:39 CEST 2012
>>> -------- .machine0 : 8 processors
>>> mpprun INFO: Starting openmpi run on 4 nodes (32 ranks)...
>>> 0.364u 0.590s 0:24.04 3.9% 0+0k 0+0io 27pf+0w
>>>> lapw1 -c -up -p (13:57:03) starting parallel lapw1 at Thu Aug 16 13:57:03 CEST 2012
>>> -> starting parallel LAPW1 jobs at Thu Aug 16 13:57:03 CEST 2012
>>> running LAPW1 in parallel mode (using .machines)
>>> 4 number_of_parallel_jobs
>>> m371 m371 m371 m371 m371 m371 m371 m371(18) 0.010u 0.006s 0:00.02 50.0% 0+0k 0+0io 1pf+0w
>>> m372 m372 m372 m372 m372 m372 m372 m372(18) 0.010u 0.005s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
>>> m373 m373 m373 m373 m373 m373 m373 m373(18) 0.012u 0.004s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
>>> m374 m374 m374 m374 m374 m374 m374 m374(18) 0.011u 0.006s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
>>> Summary of lapw1para:
>>> m371 k=0 user=0 wallclock=0
>>> m372 k=0 user=0 wallclock=0
>>> m373 k=0 user=0 wallclock=0
>>> m374 k=0 user=0 wallclock=0
>>> 0.161u 0.239s 0:06.53 5.9% 0+0k 0+0io 11pf+0w
>>>> lapw1 -c -dn -p (13:57:09) starting parallel lapw1 at Thu Aug 16 13:57:09 CEST 2012
>>> -> starting parallel LAPW1 jobs at Thu Aug 16 13:57:09 CEST 2012
>>> running LAPW1 in parallel mode (using .machines.help)
>>> 4 number_of_parallel_jobs
>>> m371 m371 m371 m371 m371 m371 m371 m371(18) 0.011u 0.005s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
>>> m372 m372 m372 m372 m372 m372 m372 m372(18) 0.009u 0.008s 0:00.02 0.0% 0+0k 0+0io 0pf+0w
>>> m373 m373 m373 m373 m373 m373 m373 m373(18) 0.009u 0.006s 0:00.01 0.0% 0+0k 0+0io 0pf+0w
>>> m374 m374 m374 m374 m374 m374 m374 m374(18) 0.008u 0.007s 0:00.01 0.0% 0+0k 0+0io 0pf+0w
>>> Summary of lapw1para:
>>> m371 k=0 user=0 wallclock=0
>>> m372 k=0 user=0 wallclock=0
>>> m373 k=0 user=0 wallclock=0
>>> m374 k=0 user=0 wallclock=0
>>> 0.138u 0.253s 0:06.39 5.9% 0+0k 0+0io 0pf+0w
>>>> lapw2 -c -up -p (13:57:16) running LAPW2 in parallel mode
>>> ** LAPW2 crashed!
>>> 0.027u 0.039s 0:00.15 33.3% 0+0k 0+0io 0pf+0w
>>> error: command /home/x_yunli/wien2k/lapw2cpara -up -c uplapw2.def failed
>>>
>>>> stop error
>>>
>>> Could you please find the problem ?
>>>
>>> Best regards,
>>> Li
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>>
>>
>> --
>> -----------------------------------------
>> Peter Blaha
>> Inst. Materials Chemistry, TU Vienna
>> Getreidemarkt 9, A-1060 Vienna, Austria
>> Tel: +43-1-5880115671
>> Fax: +43-1-5880115698
>> email: pblaha at theochem.tuwien.ac.at
>> -----------------------------------------
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
--
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi
More information about the Wien
mailing list