[Wien] LAPW2 crashed!
Yunguo Li
yunguo at kth.se
Thu Aug 16 18:35:09 CEST 2012
Dear Peter,
Thanks for your kind reply. i am sorry. I am a new user and no one has experience in my group.
I have 32 atoms in my system. Now I initiate the input using init_lapw command by hand, and choose separation energy to be -8 Ry. The initialization is successful.
I have changed the script to be :
#!/bin/bash
#cd /home/x_yunli/WIEN2k/GaNCu
#SBATCH -A matter4
#SBATCH -J tst
#SBATCH -N 4
#SBATCH -t 00:14:00
export SCRATCH=/scratch/local
export WIENROOT=/home/x_yunli/wien2k
# set .machines for parallel job
# lapw0 running on one node
echo -n "lapw0: " > .machines
echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
echo "$i:8" >> .machines
# run one mpi job on each node (splitting k-mesh over nodes)
for i in $(hostlist -e $SLURM_JOB_NODELIST)
do
echo "1:$i:8 " >> .machines
done
echo granularity:1 >> .machines
echo extrafine:1 >> .machines
#start WIEN2k
#x_lapw
#initio
#init_lapw
#main
runsp_lapw -ec 0.0001Ry -i 40 -p -I
I still got the same error. the day file is same. There is only one error file containing info:
uplapw2.error
** Error in Parallel LAPW2
** testerror: Error in Parallel LAPW2
I did't get your meaning by "You are trying this in mpi-parallel mode. Do you know when this is usefull ?? "
Is there some problem with the parallel for LAPW2 ?
Best regards,
Li
On Aug 16, 2012, at 5:29 PM, Peter Blaha wrote:
> There as many errors in this script (maybe I overlooked some others).
>
>> #!/bin/bash
>>
>> #SBATCH -A matter4
>> #SBATCH -J tst
>> #SBATCH -N 4
>> #SBATCH -t 00:14:00
>>
>>
>> export SCRATCH=/scratch/local
>> export WIENROOT=/home/x_yunli/wien2k
>>
>> # set .machines for parallel job
>> # lapw0 running on one node
>> echo -n "lapw0: " > .machines
>> echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines
>> echo "$i:8" >> .machines
>> # run one mpi job on each node (splitting k-mesh over nodes)
>> for i in $(hostlist -e $SLURM_JOB_NODELIST)
>> do
>> echo "1:$i:8 " >> .machines
>> done
>> echo granularity:1 >> .machines
>> echo extrafine:1 >> .machines
>>
>> #start WIEN2k
>> x_lapw -f GaNCu -up -c -p -fermi
>
> Whats that ???? You do not specify any program which should be executed by x_lapw ????
>
>
>> #initio
>> init_lapw -sp -red 3 -ecut 8 -numk 144
>
> -ecut 8 ??? Do you know what -ecut is ?? you specified a positive ( + 8 Ry) energy
> to separate core from valence. Probably you mean -8 ??? Do you knwo why you want to use
> -8 ??? (the default is -6).
>
>> #main
>> runsp_lapw -ec 0.0001Ry -i 40 -p -I
>
> You are trying this in mpi-parallel mode.
> Do you know when this is usefull ??
> How many atoms do you have in your cell ??
>
> It is ok to look at the dayfile, but there aremany other files you ahve to examine.
>
> The output (+error) of your batch-job (maybe it is called tst.o* and tst.e*)
>
> lse list all error files (and look at the content of the non-zero files)
> lso lists all output files. Check them.
>
>>
>> - The program stops at this point, This is the content of the day file:
>> Calculating GaNCu in /home/x_yunli/WIEN2k/GaNCu
>> on m371 with PID 18960
>> using WIEN2k_11.1 (Release 14/6/2011) in /home/x_yunli/wien2k
>>
>>
>> start (Thu Aug 16 13:56:39 CEST 2012) with lapw0 (40/99 to go)
>>
>> cycle 1 (Thu Aug 16 13:56:39 CEST 2012) (40/99 to go)
>>
>>> lapw0 -p (13:56:39) starting parallel lapw0 at Thu Aug 16 13:56:39 CEST 2012
>> -------- .machine0 : 8 processors
>> mpprun INFO: Starting openmpi run on 4 nodes (32 ranks)...
>> 0.364u 0.590s 0:24.04 3.9% 0+0k 0+0io 27pf+0w
>>> lapw1 -c -up -p (13:57:03) starting parallel lapw1 at Thu Aug 16 13:57:03 CEST 2012
>> -> starting parallel LAPW1 jobs at Thu Aug 16 13:57:03 CEST 2012
>> running LAPW1 in parallel mode (using .machines)
>> 4 number_of_parallel_jobs
>> m371 m371 m371 m371 m371 m371 m371 m371(18) 0.010u 0.006s 0:00.02 50.0% 0+0k 0+0io 1pf+0w
>> m372 m372 m372 m372 m372 m372 m372 m372(18) 0.010u 0.005s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
>> m373 m373 m373 m373 m373 m373 m373 m373(18) 0.012u 0.004s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
>> m374 m374 m374 m374 m374 m374 m374 m374(18) 0.011u 0.006s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
>> Summary of lapw1para:
>> m371 k=0 user=0 wallclock=0
>> m372 k=0 user=0 wallclock=0
>> m373 k=0 user=0 wallclock=0
>> m374 k=0 user=0 wallclock=0
>> 0.161u 0.239s 0:06.53 5.9% 0+0k 0+0io 11pf+0w
>>> lapw1 -c -dn -p (13:57:09) starting parallel lapw1 at Thu Aug 16 13:57:09 CEST 2012
>> -> starting parallel LAPW1 jobs at Thu Aug 16 13:57:09 CEST 2012
>> running LAPW1 in parallel mode (using .machines.help)
>> 4 number_of_parallel_jobs
>> m371 m371 m371 m371 m371 m371 m371 m371(18) 0.011u 0.005s 0:00.01 100.0% 0+0k 0+0io 0pf+0w
>> m372 m372 m372 m372 m372 m372 m372 m372(18) 0.009u 0.008s 0:00.02 0.0% 0+0k 0+0io 0pf+0w
>> m373 m373 m373 m373 m373 m373 m373 m373(18) 0.009u 0.006s 0:00.01 0.0% 0+0k 0+0io 0pf+0w
>> m374 m374 m374 m374 m374 m374 m374 m374(18) 0.008u 0.007s 0:00.01 0.0% 0+0k 0+0io 0pf+0w
>> Summary of lapw1para:
>> m371 k=0 user=0 wallclock=0
>> m372 k=0 user=0 wallclock=0
>> m373 k=0 user=0 wallclock=0
>> m374 k=0 user=0 wallclock=0
>> 0.138u 0.253s 0:06.39 5.9% 0+0k 0+0io 0pf+0w
>>> lapw2 -c -up -p (13:57:16) running LAPW2 in parallel mode
>> ** LAPW2 crashed!
>> 0.027u 0.039s 0:00.15 33.3% 0+0k 0+0io 0pf+0w
>> error: command /home/x_yunli/wien2k/lapw2cpara -up -c uplapw2.def failed
>>
>>> stop error
>>
>> Could you please find the problem ?
>>
>> Best regards,
>> Li
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>
> --
> -----------------------------------------
> Peter Blaha
> Inst. Materials Chemistry, TU Vienna
> Getreidemarkt 9, A-1060 Vienna, Austria
> Tel: +43-1-5880115671
> Fax: +43-1-5880115698
> email: pblaha at theochem.tuwien.ac.at
> -----------------------------------------
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
More information about the Wien
mailing list