[Wien] error in parallel lapw2
Gavin Abo
gsabo at crimson.ua.edu
Sat Oct 20 16:31:14 CEST 2018
1. It looks like you are using WIEN2k 17.1. Some serious bugs were
found in that version [
http://susi.theochem.tuwien.ac.at/reg_user/updates/ ]. Consider
installing and using WIEN2k 18.2 which has the fixes to it. Also, WIEN2k
18.2 can be patched according to previous mailing list posts [
https://github.com/gsabo/WIEN2k-Patches/tree/master/18.2 ].
2. Regarding your "file LAO.vspup is missing, i think it automatically
generated during parallel lapw2", the case.vspup file should have been
generated by lapw0. See Table 4.3 on page 36 of the WIEN2k 18.2
usersguide [
http://susi.theochem.tuwien.ac.at/reg_user/textbooks/usersguide.pdf ]
where it has program LAPW0 generates necessary case.vsp(up/dn).
3. I suggest you investigate why the LAO.vspup "can't open unit: 18"
error happens with lapw2 but not with lapw1. For example, did LAO.vspup
exist with a non-zero file size after lapw0 completed, did it exist with
a non-zero file size for lapw1, and did it get deleted or become zero in
file size or loose node connection(s) just before lapw2?
Is your .machines setup to run k-point parallel, mpi parallel, or a mix
of both? It looks like the job script that creates the .machines on the
fly was not provided that shows that.
If mpi parallel, using WIEN2k 18.2:
1. Run: ./siteconfig
2. Select Compiling Options, Selection: O
3. Select Parallel options, Selection: PO
4. What is MPIRUN set to?
You also might check your mpirun command and talk with your cluster
administrator to see if a supported mpi run command is being used for
the system [
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17628.html
].
Have you checked the standard output/error file? This file name can
vary from one system to another. So you have to check your
scheduling/queue system documentation to see what the default file(s) is
called or use an option to name it yourself [ for example,
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18080.html
]. If there is a mpi run error, it usually shows up in that file.
You also might have to check the hidden dot files [
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17317.html
] and output files (like case.output0, case.output1, etc.).
On 10/20/2018 1:58 AM, BUSHRA SABIR wrote:
> Dear Peter Blaha and wien2k users
>
> I am facing one problem in parallel execution of job script. I am
> working on LaXO3 materials. initialization is ok but when i submitted
> job file on cluster for parallel execution with command line
> runsp_lapw -cc 0.001 -ec 0.0001 -i 40 -p .
>
> following error apears.cat *.error
>
> 'LAPW2' - can't open unit: 18
> 'LAPW2' - filename: LAO.vspup
> 'LAPW2' - status: old form: formatted
> ** testerror: Error in Parallel LAPW2
>
> file LAO.vspup is missing, i think it automatically generated during
> parallel lapw2
>
> i checked testpara1_lapw
> #####################################################
> # TESTPARA1 #
> #####################################################
>
> Sat Oct 20 00:22:39 PDT 2018
>
> lapw1para has finished
>
> for testpara2_lapw
> #####################################################
> # TESTPARA1 #
> #####################################################
>
> Sat Oct 20 00:22:39 PDT 2018
>
> lapw1para has finished
>
> At the end of day file following error is shown
>
> 0.088u 0.060s 0:05.14 2.7% 0+0k 0+288io 0pf+0w
> > lapw2 -up -p (23:56:15) running LAPW2 in parallel mode
> ** LAPW2 crashed!
> 0.048u 0.312s 0:00.72 48.6% 0+0k 11386+96io 36pf+0w
> error: command
> /global/common/sw/cray/cnl6/haswell/wien2k/17.1/intel/17.0.2.174/wkteycp/lapw2para
> -up uplapw2.def failed
>
> i go through mailing list but could not find solution.
>
>
> Bushra
> PhD student
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20181020/d7537732/attachment-0001.html>
More information about the Wien
mailing list