[Wien] error in parallel lapw2

Dr. K. C. Bhamu kcbhamu85 at gmail.com
Sat Oct 20 20:37:37 CEST 2018


Dear Gavin,
(updated)
I am writing on behalf of Ms. Bushra, as she is not able to reply for now,
with some test on the same cluster with wien2k version 17.1 and 18.2.

The actual error what she/me see is "/usr/common/nsg/bin/mpirun: Permission
denied" which may be solved by cluster admin only.

For Wien2k_17.1 the mpirun was defined as "mpirun -n _NP_ -machinefile
_HOSTS_ _EXEC_"

As in one of the thread Prof. Peter suggested to use "ifort + slurm".

Yes, I just installed Wien2k_18.2 at NERSC with ifort+slurm system
environment.

and the mpirun command is now "srun -K -N_nodes_ -n_NP_ -r_offset_
_PINNING_ _EXEC_"

But still I face same error.

The error is same and it does't matter if we have mpirun or srun [1]. Only
srun and mpirun word changes in the error.


In the past I faces same error and cluster admin only could solve so let us
first write to cluster admin and will update here the final outcome.

If you have any advice that can help to get rid of this issue please let us
know.

[1]
srun: error: No hardware architecture specified (-C)!
srun: error: Unable to allocate resources: Unspecified error
srun: fatal: --relative option invalid for job allocation request
srun: error: No hardware architecture specified (-C)!
srun: error: Unable to allocate resources: Unspecified error
LAO.scf1up_1: No such file or directory.
grep: No match.
srun: fatal: --relative option invalid for job allocation request
srun: error: No hardware architecture specified (-C)!
srun: error: Unable to allocate resources: Unspecified error
LAO.scf1dn_1: No such file or directory.
grep: No match.
LAPW2 - Error. Check file lapw2.error
cp: cannot stat '.in.tmp': No such file or directory
grep: No match.
grep: No match.
grep: No match.

>   stop error



On Sat, Oct 20, 2018 at 8:01 PM Gavin Abo <gsabo at crimson.ua.edu> wrote:

> 1. It looks like you are using WIEN2k 17.1.  Some serious bugs were found
> in that version [ http://susi.theochem.tuwien.ac.at/reg_user/updates/ ].
> Consider installing and using WIEN2k 18.2 which has the fixes to it.  Also,
> WIEN2k 18.2 can be patched according to previous mailing list posts [
> https://github.com/gsabo/WIEN2k-Patches/tree/master/18.2 ].
>
> 2. Regarding your "file LAO.vspup is missing, i think it automatically
> generated during parallel lapw2", the case.vspup file should have been
> generated by lapw0.  See Table 4.3 on page 36 of the WIEN2k 18.2 usersguide
> [ http://susi.theochem.tuwien.ac.at/reg_user/textbooks/usersguide.pdf ]
> where it has program LAPW0 generates necessary case.vsp(up/dn).
>
> 3. I suggest you investigate why the LAO.vspup "can't open unit: 18" error
> happens with lapw2 but not with lapw1.  For example, did LAO.vspup exist
> with a non-zero file size after lapw0 completed, did it exist with a
> non-zero file size for lapw1, and did it get deleted or become zero in file
> size or loose node connection(s) just before lapw2?
>
> Is your .machines setup to run k-point parallel, mpi parallel, or a mix of
> both?  It looks like the job script that creates the .machines on the fly
> was not provided that shows that.
>
> If mpi parallel, using WIEN2k 18.2:
>
> 1. Run: ./siteconfig
> 2. Select Compiling Options, Selection: O
> 3. Select Parallel options, Selection: PO
> 4. What is MPIRUN set to?
>
> You also might check your mpirun command and talk with your cluster
> administrator to see if a supported mpi run command is being used for the
> system [
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17628.html
> ].
>
> Have you checked the standard output/error file?  This file name can vary
> from one system to another.  So you have to check your scheduling/queue
> system documentation to see what the default file(s) is called or use an
> option to name it yourself [ for example,
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18080.html
> ].  If there is a mpi run error, it usually shows up in that file.
>
> You also might have to check the hidden dot files [
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17317.html
> ] and output files (like case.output0, case.output1, etc.).
>
>
> On 10/20/2018 1:58 AM, BUSHRA SABIR wrote:
>
> Dear Peter Blaha and wien2k users
>
> I am facing one problem in parallel execution of job script. I am working
> on LaXO3 materials. initialization is ok but when i submitted job file on
> cluster for parallel execution with command line runsp_lapw -cc 0.001 -ec
> 0.0001 -i 40 -p .
>
> following error apears.cat *.error
>
> 'LAPW2' - can't open unit:
> 18
>  'LAPW2' -        filename:
> LAO.vspup
>  'LAPW2' -          status: old          form:
> formatted
> **  testerror: Error in Parallel LAPW2
>
> file LAO.vspup is missing, i think it automatically generated during
> parallel lapw2
>
> i checked testpara1_lapw
> #####################################################
> #                     TESTPARA1                     #
> #####################################################
>
> Sat Oct 20 00:22:39 PDT 2018
>
>     lapw1para has finished
>
>  for testpara2_lapw
> #####################################################
> #                     TESTPARA1                     #
> #####################################################
>
> Sat Oct 20 00:22:39 PDT 2018
>
>     lapw1para has finished
>
> At the end of day file following error is shown
>
> 0.088u 0.060s 0:05.14 2.7%    0+0k 0+288io 0pf+0w
> >   lapw2 -up -p          (23:56:15) running LAPW2 in parallel mode
> **  LAPW2 crashed!
> 0.048u 0.312s 0:00.72 48.6%    0+0k 11386+96io 36pf+0w
> error: command   /global/common/sw/cray/cnl6/haswell/wien2k/17.1/intel/
> 17.0.2.174/wkteycp/lapw2para -up uplapw2.def   failed
>
> i go through mailing list but could not find solution.
>
>
> Bushra
> PhD student
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20181021/acfad59b/attachment.html>


More information about the Wien mailing list